DNA-protein quasi-mapping for differential gene expression analysis in non-model organisms

RNA-seq is an experiment technique that utilizes modern, high throughput sequencing technology to sequence a population of mRNA. A common use of RNAseq is for Differential Gene Expression Analysis (DGEA), which is the process of identifying genes with significant changes in their expression levels a...

Full description

Saved in:
Bibliographic Details
Main Author: Santiago, Kyle Christian Ramon L.
Format: text
Language:English
Published: Animo Repository 2022
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etdm_softtech/7
https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1008&context=etdm_softtech
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
Description
Summary:RNA-seq is an experiment technique that utilizes modern, high throughput sequencing technology to sequence a population of mRNA. A common use of RNAseq is for Differential Gene Expression Analysis (DGEA), which is the process of identifying genes with significant changes in their expression levels across conditions. Typical DGEA pipelines, which require an annotated reference genome or transcriptome, cannot be applied to most organisms, since only a few organisms have been extensively studied and have a high quality annotated reference transcriptome available. A more complex pipeline is often used for DGEA in the case of organisms without an annotated reference transcriptome. This complex pipeline involves constructing a de novo transcriptome assembly, which is the process of reconstructing transcript sequences from the RNA-seq reads. However, constructing a de novo assembly is computationally expensive. Recently, we proposed a novel alternative, in which we directly align the RNA-seq reads to a protein database of a close relative. The alternative pipeline provides improvements in speed and memory usage, while improving the precision and recall in identifying genes that are differentially expressing. However, this alternative pipeline utilizes full sequence alignments which take time and generate information unnecessary for DGEA. This study replaces full sequence alignments with quasi-mapping, which determines the mapping by rapid look-ups of sub-strings of a query sequence. We report a further speed-up by replacing full sequence alignment with quasi-mapping, making our pipeline > 1000× faster than assembly-based approach, and still more accurate. We also compared quasi-mapping to other mapping techniques, and show that it is faster but at the cost of sensitivity.