GPRED-GC: A Gene PREDiction model accounting for 5 <sup>′</sup>- 3<sup>′</sup> GC gradient

© 2019 The Author(s). Background: Gene is a key step in genome annotation. Ab initio gene prediction enables gene annotation of new genomes regardless of availability of homologous sequences. There exist a number of ab initio gene prediction tools and they have been widely used for gene annotation f...

Full description

Saved in:
Bibliographic Details
Main Authors: Prapaporn Techa-Angkoon, Kevin L. Childs, Yanni Sun
Format: Journal
Published: 2020
Subjects:
Online Access:https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85077127673&origin=inward
http://cmuir.cmu.ac.th/jspui/handle/6653943832/67588
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Chiang Mai University
id th-cmuir.6653943832-67588
record_format dspace
spelling th-cmuir.6653943832-675882020-04-02T15:10:26Z GPRED-GC: A Gene PREDiction model accounting for 5 <sup>′</sup>- 3<sup>′</sup> GC gradient Prapaporn Techa-Angkoon Kevin L. Childs Yanni Sun Biochemistry, Genetics and Molecular Biology Computer Science Mathematics © 2019 The Author(s). Background: Gene is a key step in genome annotation. Ab initio gene prediction enables gene annotation of new genomes regardless of availability of homologous sequences. There exist a number of ab initio gene prediction tools and they have been widely used for gene annotation for various species. However, existing tools are not optimized for identifying genes with highly variable GC content. In addition, some genes in grass genomes exhibit a sharp 5 ′- 3′ decreasing GC content gradient, which is not carefully modeled by available gene prediction tools. Thus, there is still room to improve the sensitivity and accuracy for predicting genes with GC gradients. Results: In this work, we designed and implemented a new hidden Markov model (HMM)-based ab initio gene prediction tool, which is optimized for finding genes with highly variable GC contents, such as the genes with negative GC gradients in grass genomes. We tested the tool on three datasets from Arabidopsis thaliana and Oryza sativa. The results showed that our tool can identify genes missed by existing tools due to the highly variable GC contents. Conclusions: GPRED-GC can effectively predict genes with highly variable GC contents without manual intervention. It provides a useful complementary tool to existing ones such as Augustus for more sensitive gene discovery. The source code is freely available at https://sourceforge.net/projects/gpred-gc/. 2020-04-02T14:56:19Z 2020-04-02T14:56:19Z 2019-12-24 Journal 14712105 2-s2.0-85077127673 10.1186/s12859-019-3047-3 https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85077127673&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/67588
institution Chiang Mai University
building Chiang Mai University Library
country Thailand
collection CMU Intellectual Repository
topic Biochemistry, Genetics and Molecular Biology
Computer Science
Mathematics
spellingShingle Biochemistry, Genetics and Molecular Biology
Computer Science
Mathematics
Prapaporn Techa-Angkoon
Kevin L. Childs
Yanni Sun
GPRED-GC: A Gene PREDiction model accounting for 5 <sup>′</sup>- 3<sup>′</sup> GC gradient
description © 2019 The Author(s). Background: Gene is a key step in genome annotation. Ab initio gene prediction enables gene annotation of new genomes regardless of availability of homologous sequences. There exist a number of ab initio gene prediction tools and they have been widely used for gene annotation for various species. However, existing tools are not optimized for identifying genes with highly variable GC content. In addition, some genes in grass genomes exhibit a sharp 5 ′- 3′ decreasing GC content gradient, which is not carefully modeled by available gene prediction tools. Thus, there is still room to improve the sensitivity and accuracy for predicting genes with GC gradients. Results: In this work, we designed and implemented a new hidden Markov model (HMM)-based ab initio gene prediction tool, which is optimized for finding genes with highly variable GC contents, such as the genes with negative GC gradients in grass genomes. We tested the tool on three datasets from Arabidopsis thaliana and Oryza sativa. The results showed that our tool can identify genes missed by existing tools due to the highly variable GC contents. Conclusions: GPRED-GC can effectively predict genes with highly variable GC contents without manual intervention. It provides a useful complementary tool to existing ones such as Augustus for more sensitive gene discovery. The source code is freely available at https://sourceforge.net/projects/gpred-gc/.
format Journal
author Prapaporn Techa-Angkoon
Kevin L. Childs
Yanni Sun
author_facet Prapaporn Techa-Angkoon
Kevin L. Childs
Yanni Sun
author_sort Prapaporn Techa-Angkoon
title GPRED-GC: A Gene PREDiction model accounting for 5 <sup>′</sup>- 3<sup>′</sup> GC gradient
title_short GPRED-GC: A Gene PREDiction model accounting for 5 <sup>′</sup>- 3<sup>′</sup> GC gradient
title_full GPRED-GC: A Gene PREDiction model accounting for 5 <sup>′</sup>- 3<sup>′</sup> GC gradient
title_fullStr GPRED-GC: A Gene PREDiction model accounting for 5 <sup>′</sup>- 3<sup>′</sup> GC gradient
title_full_unstemmed GPRED-GC: A Gene PREDiction model accounting for 5 <sup>′</sup>- 3<sup>′</sup> GC gradient
title_sort gpred-gc: a gene prediction model accounting for 5 <sup>′</sup>- 3<sup>′</sup> gc gradient
publishDate 2020
url https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85077127673&origin=inward
http://cmuir.cmu.ac.th/jspui/handle/6653943832/67588
_version_ 1681426663298039808