LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes

Despite the abundance of species with transcriptomic data, a significant number of species still lack sequenced genomes, making it difficult to study gene function and expression in these organisms. While de novo transcriptome assembly can be used to assemble protein-coding transcripts from RNA-sequ...

Full description

Saved in:
Bibliographic Details
Main Authors: Lim, Peng Ken, Wang, Ruoxi, Mutwil, Marek
Other Authors: School of Biological Sciences
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181004
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-181004
record_format dspace
spelling sg-ntu-dr.10356-1810042024-11-11T02:32:46Z LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes Lim, Peng Ken Wang, Ruoxi Mutwil, Marek School of Biological Sciences Medicine, Health and Life Sciences Eukaryote Gene expression profiling Despite the abundance of species with transcriptomic data, a significant number of species still lack sequenced genomes, making it difficult to study gene function and expression in these organisms. While de novo transcriptome assembly can be used to assemble protein-coding transcripts from RNA-sequencing (RNA-seq) data, the datasets used often only feature samples of arbitrarily selected or similar experimental conditions, which might fail to capture condition-specific transcripts. We developed the Large-Scale Transcriptome Assembly Pipeline for de novo assembled transcripts (LSTrAP-denovo) to automatically generate transcriptome atlases of eukaryotic species. Specifically, given an NCBI TaxID, LSTrAP-denovo can (1) filter undesirable RNA-seq accessions based on read data, (2) select RNA-seq accessions via unsupervised machine learning to construct a sample-balanced dataset for download, (3) assemble transcripts via over-assembly, (4) functionally annotate coding sequences (CDS) from assembled transcripts and (5) generate transcriptome atlases in the form of expression matrices for downstream transcriptomic analyses. LSTrAP-denovo is easy to implement, written in Python, and is freely available at https://github.com/pengkenlim/LSTrAP-denovo/. Ministry of Education (MOE) Ministry of Education - Singapore,Grant/Award Number: MOE-MOET32022-0002. 2024-11-11T02:32:46Z 2024-11-11T02:32:46Z 2024 Journal Article Lim, P. K., Wang, R. & Mutwil, M. (2024). LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes. Physiologia Plantarum, 176(4), e14407-. https://dx.doi.org/10.1111/ppl.14407 0031-9317 https://hdl.handle.net/10356/181004 10.1111/ppl.14407 38973613 2-s2.0-85197732921 4 176 e14407 en MOE-MOET32022-0002 Physiologia Plantarum © 2024 Scandinavian Plant Physiology Society. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Medicine, Health and Life Sciences
Eukaryote
Gene expression profiling
spellingShingle Medicine, Health and Life Sciences
Eukaryote
Gene expression profiling
Lim, Peng Ken
Wang, Ruoxi
Mutwil, Marek
LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes
description Despite the abundance of species with transcriptomic data, a significant number of species still lack sequenced genomes, making it difficult to study gene function and expression in these organisms. While de novo transcriptome assembly can be used to assemble protein-coding transcripts from RNA-sequencing (RNA-seq) data, the datasets used often only feature samples of arbitrarily selected or similar experimental conditions, which might fail to capture condition-specific transcripts. We developed the Large-Scale Transcriptome Assembly Pipeline for de novo assembled transcripts (LSTrAP-denovo) to automatically generate transcriptome atlases of eukaryotic species. Specifically, given an NCBI TaxID, LSTrAP-denovo can (1) filter undesirable RNA-seq accessions based on read data, (2) select RNA-seq accessions via unsupervised machine learning to construct a sample-balanced dataset for download, (3) assemble transcripts via over-assembly, (4) functionally annotate coding sequences (CDS) from assembled transcripts and (5) generate transcriptome atlases in the form of expression matrices for downstream transcriptomic analyses. LSTrAP-denovo is easy to implement, written in Python, and is freely available at https://github.com/pengkenlim/LSTrAP-denovo/.
author2 School of Biological Sciences
author_facet School of Biological Sciences
Lim, Peng Ken
Wang, Ruoxi
Mutwil, Marek
format Article
author Lim, Peng Ken
Wang, Ruoxi
Mutwil, Marek
author_sort Lim, Peng Ken
title LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes
title_short LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes
title_full LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes
title_fullStr LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes
title_full_unstemmed LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes
title_sort lstrap-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes
publishDate 2024
url https://hdl.handle.net/10356/181004
_version_ 1816858927542829056