Investigating the improvement of transcriptome assembly via a consensus-based ensemble approach
This paper investigates the development of a transcriptome assembly pipeline that provides as clean and accurate of a result as possible, for the plant Cycas edentata. De novo transcriptome assembly is a necessary process to understand the gene expression of non-model organisms that may not have a s...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/156860 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This paper investigates the development of a transcriptome assembly pipeline that provides as clean and accurate of a result as possible, for the plant Cycas edentata. De novo transcriptome assembly is a necessary process to understand the gene expression of non-model organisms that may not have a sequenced genome available. It can be quite difficult to obtain transcriptomes free of extraneous material such as splicing isoforms and partially transcribed sequences. Therefore, this paper aims to investigate various methods that may address this issue, by comparing the performance of single assemblers and merged assemblers, as well as the impact of varying k-mer size and introducing post-assembly modifications. Results show that the best pipeline for Cycas edentata involves the program SOAPdenovo-Trans at K35, followed by the program CD-HIT-Est, to obtain the cleanest and most accurate transcriptome. This pipeline may be used in future for other datasets as well. |
---|