High-throughput sequencing data and the impact of plant gene annotation quality

The use of draft genomes of different species and re-sequencing of accessions and populations are now common tools for plant biology research. The de novo assembled draft genomes make it possible to identify pivotal divergence points in the plant lineage and provide an opportunity to investigate the...

全面介紹

Saved in:
書目詳細資料
Main Authors: Vaattovaara, Aleksia, Leppälä, Johanna, Salojärvi, Jarkko, Wrzaczek, Michael
其他作者: School of Biological Sciences
格式: Article
語言:English
出版: 2019
主題:
在線閱讀:https://hdl.handle.net/10356/105366
http://hdl.handle.net/10220/49526
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:The use of draft genomes of different species and re-sequencing of accessions and populations are now common tools for plant biology research. The de novo assembled draft genomes make it possible to identify pivotal divergence points in the plant lineage and provide an opportunity to investigate the genomic basis and timing of biological innovations by inferring orthologs between species. Furthermore, re-sequencing facilitates the mapping and subsequent molecular characterization of causative loci for traits, such as those for plant stress tolerance and development. In both cases high-quality gene annotation—the identification of protein-coding regions, gene promoters, and 5′- and 3′-untranslated regions—is critical for investigation of gene function. Annotations are constantly improving but automated gene annotations still require manual curation and experimental validation. This is particularly important for genes with large introns, genes located in regions rich with transposable elements or repeats, large gene families, and segmentally duplicated genes. In this opinion paper, we highlight the impact of annotation quality on evolutionary analyses, genome-wide association studies, and the identification of orthologous genes in plants. Furthermore, we predict that incorporating accurate information from manual curation into databases will dramatically improve the performance of automated gene predictors.