LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes
Despite the abundance of species with transcriptomic data, a significant number of species still lack sequenced genomes, making it difficult to study gene function and expression in these organisms. While de novo transcriptome assembly can be used to assemble protein-coding transcripts from RNA-sequ...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181004 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-181004 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1810042024-11-11T02:32:46Z LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes Lim, Peng Ken Wang, Ruoxi Mutwil, Marek School of Biological Sciences Medicine, Health and Life Sciences Eukaryote Gene expression profiling Despite the abundance of species with transcriptomic data, a significant number of species still lack sequenced genomes, making it difficult to study gene function and expression in these organisms. While de novo transcriptome assembly can be used to assemble protein-coding transcripts from RNA-sequencing (RNA-seq) data, the datasets used often only feature samples of arbitrarily selected or similar experimental conditions, which might fail to capture condition-specific transcripts. We developed the Large-Scale Transcriptome Assembly Pipeline for de novo assembled transcripts (LSTrAP-denovo) to automatically generate transcriptome atlases of eukaryotic species. Specifically, given an NCBI TaxID, LSTrAP-denovo can (1) filter undesirable RNA-seq accessions based on read data, (2) select RNA-seq accessions via unsupervised machine learning to construct a sample-balanced dataset for download, (3) assemble transcripts via over-assembly, (4) functionally annotate coding sequences (CDS) from assembled transcripts and (5) generate transcriptome atlases in the form of expression matrices for downstream transcriptomic analyses. LSTrAP-denovo is easy to implement, written in Python, and is freely available at https://github.com/pengkenlim/LSTrAP-denovo/. Ministry of Education (MOE) Ministry of Education - Singapore,Grant/Award Number: MOE-MOET32022-0002. 2024-11-11T02:32:46Z 2024-11-11T02:32:46Z 2024 Journal Article Lim, P. K., Wang, R. & Mutwil, M. (2024). LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes. Physiologia Plantarum, 176(4), e14407-. https://dx.doi.org/10.1111/ppl.14407 0031-9317 https://hdl.handle.net/10356/181004 10.1111/ppl.14407 38973613 2-s2.0-85197732921 4 176 e14407 en MOE-MOET32022-0002 Physiologia Plantarum © 2024 Scandinavian Plant Physiology Society. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Medicine, Health and Life Sciences Eukaryote Gene expression profiling |
spellingShingle |
Medicine, Health and Life Sciences Eukaryote Gene expression profiling Lim, Peng Ken Wang, Ruoxi Mutwil, Marek LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes |
description |
Despite the abundance of species with transcriptomic data, a significant number of species still lack sequenced genomes, making it difficult to study gene function and expression in these organisms. While de novo transcriptome assembly can be used to assemble protein-coding transcripts from RNA-sequencing (RNA-seq) data, the datasets used often only feature samples of arbitrarily selected or similar experimental conditions, which might fail to capture condition-specific transcripts. We developed the Large-Scale Transcriptome Assembly Pipeline for de novo assembled transcripts (LSTrAP-denovo) to automatically generate transcriptome atlases of eukaryotic species. Specifically, given an NCBI TaxID, LSTrAP-denovo can (1) filter undesirable RNA-seq accessions based on read data, (2) select RNA-seq accessions via unsupervised machine learning to construct a sample-balanced dataset for download, (3) assemble transcripts via over-assembly, (4) functionally annotate coding sequences (CDS) from assembled transcripts and (5) generate transcriptome atlases in the form of expression matrices for downstream transcriptomic analyses. LSTrAP-denovo is easy to implement, written in Python, and is freely available at https://github.com/pengkenlim/LSTrAP-denovo/. |
author2 |
School of Biological Sciences |
author_facet |
School of Biological Sciences Lim, Peng Ken Wang, Ruoxi Mutwil, Marek |
format |
Article |
author |
Lim, Peng Ken Wang, Ruoxi Mutwil, Marek |
author_sort |
Lim, Peng Ken |
title |
LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes |
title_short |
LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes |
title_full |
LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes |
title_fullStr |
LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes |
title_full_unstemmed |
LSTrAP-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes |
title_sort |
lstrap-denovo: automated generation of transcriptome atlases for eukaryotic species without genomes |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/181004 |
_version_ |
1816858927542829056 |