Investigating the improvement of transcriptome assembly via a consensus-based ensemble approach

This paper investigates the development of a transcriptome assembly pipeline that provides as clean and accurate of a result as possible, for the plant Cycas edentata. De novo transcriptome assembly is a necessary process to understand the gene expression of non-model organisms that may not have a s...

Full description

Saved in:
Bibliographic Details
Main Author: Srinivasan, Niyathi
Other Authors: Marek Mutwil
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/156860
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This paper investigates the development of a transcriptome assembly pipeline that provides as clean and accurate of a result as possible, for the plant Cycas edentata. De novo transcriptome assembly is a necessary process to understand the gene expression of non-model organisms that may not have a sequenced genome available. It can be quite difficult to obtain transcriptomes free of extraneous material such as splicing isoforms and partially transcribed sequences. Therefore, this paper aims to investigate various methods that may address this issue, by comparing the performance of single assemblers and merged assemblers, as well as the impact of varying k-mer size and introducing post-assembly modifications. Results show that the best pipeline for Cycas edentata involves the program SOAPdenovo-Trans at K35, followed by the program CD-HIT-Est, to obtain the cleanest and most accurate transcriptome. This pipeline may be used in future for other datasets as well.