DNA-protein quasi-mapping for differential gene expression analysis in non-model organisms

RNA-seq is an experiment technique that utilizes modern, high throughput sequencing technology to sequence a population of mRNA. A common use of RNAseq is for Differential Gene Expression Analysis (DGEA), which is the process of identifying genes with significant changes in their expression levels a...

Full description

Saved in:
Bibliographic Details
Main Author: Santiago, Kyle Christian Ramon L.
Format: text
Language:English
Published: Animo Repository 2022
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etdm_softtech/7
https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1008&context=etdm_softtech
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etdm_softtech-1008
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etdm_softtech-10082022-12-15T08:34:52Z DNA-protein quasi-mapping for differential gene expression analysis in non-model organisms Santiago, Kyle Christian Ramon L. RNA-seq is an experiment technique that utilizes modern, high throughput sequencing technology to sequence a population of mRNA. A common use of RNAseq is for Differential Gene Expression Analysis (DGEA), which is the process of identifying genes with significant changes in their expression levels across conditions. Typical DGEA pipelines, which require an annotated reference genome or transcriptome, cannot be applied to most organisms, since only a few organisms have been extensively studied and have a high quality annotated reference transcriptome available. A more complex pipeline is often used for DGEA in the case of organisms without an annotated reference transcriptome. This complex pipeline involves constructing a de novo transcriptome assembly, which is the process of reconstructing transcript sequences from the RNA-seq reads. However, constructing a de novo assembly is computationally expensive. Recently, we proposed a novel alternative, in which we directly align the RNA-seq reads to a protein database of a close relative. The alternative pipeline provides improvements in speed and memory usage, while improving the precision and recall in identifying genes that are differentially expressing. However, this alternative pipeline utilizes full sequence alignments which take time and generate information unnecessary for DGEA. This study replaces full sequence alignments with quasi-mapping, which determines the mapping by rapid look-ups of sub-strings of a query sequence. We report a further speed-up by replacing full sequence alignment with quasi-mapping, making our pipeline > 1000× faster than assembly-based approach, and still more accurate. We also compared quasi-mapping to other mapping techniques, and show that it is faster but at the cost of sensitivity. 2022-12-01T08:00:00Z text application/pdf https://animorepository.dlsu.edu.ph/etdm_softtech/7 https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1008&context=etdm_softtech Software Technology Master's Theses English Animo Repository Nucleotide sequence Gene expression Computer Sciences
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
topic Nucleotide sequence
Gene expression
Computer Sciences
spellingShingle Nucleotide sequence
Gene expression
Computer Sciences
Santiago, Kyle Christian Ramon L.
DNA-protein quasi-mapping for differential gene expression analysis in non-model organisms
description RNA-seq is an experiment technique that utilizes modern, high throughput sequencing technology to sequence a population of mRNA. A common use of RNAseq is for Differential Gene Expression Analysis (DGEA), which is the process of identifying genes with significant changes in their expression levels across conditions. Typical DGEA pipelines, which require an annotated reference genome or transcriptome, cannot be applied to most organisms, since only a few organisms have been extensively studied and have a high quality annotated reference transcriptome available. A more complex pipeline is often used for DGEA in the case of organisms without an annotated reference transcriptome. This complex pipeline involves constructing a de novo transcriptome assembly, which is the process of reconstructing transcript sequences from the RNA-seq reads. However, constructing a de novo assembly is computationally expensive. Recently, we proposed a novel alternative, in which we directly align the RNA-seq reads to a protein database of a close relative. The alternative pipeline provides improvements in speed and memory usage, while improving the precision and recall in identifying genes that are differentially expressing. However, this alternative pipeline utilizes full sequence alignments which take time and generate information unnecessary for DGEA. This study replaces full sequence alignments with quasi-mapping, which determines the mapping by rapid look-ups of sub-strings of a query sequence. We report a further speed-up by replacing full sequence alignment with quasi-mapping, making our pipeline > 1000× faster than assembly-based approach, and still more accurate. We also compared quasi-mapping to other mapping techniques, and show that it is faster but at the cost of sensitivity.
format text
author Santiago, Kyle Christian Ramon L.
author_facet Santiago, Kyle Christian Ramon L.
author_sort Santiago, Kyle Christian Ramon L.
title DNA-protein quasi-mapping for differential gene expression analysis in non-model organisms
title_short DNA-protein quasi-mapping for differential gene expression analysis in non-model organisms
title_full DNA-protein quasi-mapping for differential gene expression analysis in non-model organisms
title_fullStr DNA-protein quasi-mapping for differential gene expression analysis in non-model organisms
title_full_unstemmed DNA-protein quasi-mapping for differential gene expression analysis in non-model organisms
title_sort dna-protein quasi-mapping for differential gene expression analysis in non-model organisms
publisher Animo Repository
publishDate 2022
url https://animorepository.dlsu.edu.ph/etdm_softtech/7
https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1008&context=etdm_softtech
_version_ 1753806428165898240