DNA-protein quasi-mapping for differential gene expression analysis in non-model organisms

RNA-seq is an experiment technique that utilizes modern, high throughput sequencing technology to sequence a population of mRNA. A common use of RNAseq is for Differential Gene Expression Analysis (DGEA), which is the process of identifying genes with significant changes in their expression levels a...

全面介紹

Saved in:
書目詳細資料
主要作者: Santiago, Kyle Christian Ramon L.
格式: text
語言:English
出版: Animo Repository 2022
主題:
在線閱讀:https://animorepository.dlsu.edu.ph/etdm_softtech/7
https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1008&context=etdm_softtech
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:RNA-seq is an experiment technique that utilizes modern, high throughput sequencing technology to sequence a population of mRNA. A common use of RNAseq is for Differential Gene Expression Analysis (DGEA), which is the process of identifying genes with significant changes in their expression levels across conditions. Typical DGEA pipelines, which require an annotated reference genome or transcriptome, cannot be applied to most organisms, since only a few organisms have been extensively studied and have a high quality annotated reference transcriptome available. A more complex pipeline is often used for DGEA in the case of organisms without an annotated reference transcriptome. This complex pipeline involves constructing a de novo transcriptome assembly, which is the process of reconstructing transcript sequences from the RNA-seq reads. However, constructing a de novo assembly is computationally expensive. Recently, we proposed a novel alternative, in which we directly align the RNA-seq reads to a protein database of a close relative. The alternative pipeline provides improvements in speed and memory usage, while improving the precision and recall in identifying genes that are differentially expressing. However, this alternative pipeline utilizes full sequence alignments which take time and generate information unnecessary for DGEA. This study replaces full sequence alignments with quasi-mapping, which determines the mapping by rapid look-ups of sub-strings of a query sequence. We report a further speed-up by replacing full sequence alignment with quasi-mapping, making our pipeline > 1000× faster than assembly-based approach, and still more accurate. We also compared quasi-mapping to other mapping techniques, and show that it is faster but at the cost of sensitivity.