De Novo assembly of complete chloroplast genomes from non-model species based on a K-mer frequency-based selection of chloroplast reads from total DNA sequences

Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequenci...

Full description

Saved in:
Bibliographic Details
Main Authors: Ramlee, Shairul Izan, Esselink, Danny, Visser, Richard G. F., Smulders, Marinus J. M., Borm, Theo
Format: Article
Language:English
Published: Frontiers Media 2017
Online Access:http://psasir.upm.edu.my/id/eprint/61283/1/NOVO.pdf
http://psasir.upm.edu.my/id/eprint/61283/
https://www.frontiersin.org/articles/10.3389/fpls.2017.01271/full
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Putra Malaysia
Language: English
id my.upm.eprints.61283
record_format eprints
spelling my.upm.eprints.612832021-07-21T07:52:12Z http://psasir.upm.edu.my/id/eprint/61283/ De Novo assembly of complete chloroplast genomes from non-model species based on a K-mer frequency-based selection of chloroplast reads from total DNA sequences Ramlee, Shairul Izan Esselink, Danny Visser, Richard G. F. Smulders, Marinus J. M. Borm, Theo Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb), Aegilops tauschii (4 Gb) and Paphiopedilum henryanum (25 Gb). We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes. Frontiers Media 2017 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/61283/1/NOVO.pdf Ramlee, Shairul Izan and Esselink, Danny and Visser, Richard G. F. and Smulders, Marinus J. M. and Borm, Theo (2017) De Novo assembly of complete chloroplast genomes from non-model species based on a K-mer frequency-based selection of chloroplast reads from total DNA sequences. Frontiers in Plant Science, 8. art. no. 1271. pp. 1-13. ISSN 1664-462X https://www.frontiersin.org/articles/10.3389/fpls.2017.01271/full 10.3389/fpls.2017.01271
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
description Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb), Aegilops tauschii (4 Gb) and Paphiopedilum henryanum (25 Gb). We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes.
format Article
author Ramlee, Shairul Izan
Esselink, Danny
Visser, Richard G. F.
Smulders, Marinus J. M.
Borm, Theo
spellingShingle Ramlee, Shairul Izan
Esselink, Danny
Visser, Richard G. F.
Smulders, Marinus J. M.
Borm, Theo
De Novo assembly of complete chloroplast genomes from non-model species based on a K-mer frequency-based selection of chloroplast reads from total DNA sequences
author_facet Ramlee, Shairul Izan
Esselink, Danny
Visser, Richard G. F.
Smulders, Marinus J. M.
Borm, Theo
author_sort Ramlee, Shairul Izan
title De Novo assembly of complete chloroplast genomes from non-model species based on a K-mer frequency-based selection of chloroplast reads from total DNA sequences
title_short De Novo assembly of complete chloroplast genomes from non-model species based on a K-mer frequency-based selection of chloroplast reads from total DNA sequences
title_full De Novo assembly of complete chloroplast genomes from non-model species based on a K-mer frequency-based selection of chloroplast reads from total DNA sequences
title_fullStr De Novo assembly of complete chloroplast genomes from non-model species based on a K-mer frequency-based selection of chloroplast reads from total DNA sequences
title_full_unstemmed De Novo assembly of complete chloroplast genomes from non-model species based on a K-mer frequency-based selection of chloroplast reads from total DNA sequences
title_sort de novo assembly of complete chloroplast genomes from non-model species based on a k-mer frequency-based selection of chloroplast reads from total dna sequences
publisher Frontiers Media
publishDate 2017
url http://psasir.upm.edu.my/id/eprint/61283/1/NOVO.pdf
http://psasir.upm.edu.my/id/eprint/61283/
https://www.frontiersin.org/articles/10.3389/fpls.2017.01271/full
_version_ 1706958776526962688