An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data
© 2019 Elsevier B.V. Whole-genome sequencing (WGS) data allow for an inference of Mycobacterium tuberculosis (Mtb) clusters by using a pairwise genetic distance of ≤12 single nucleotide polymorphisms (SNPs) as a threshold. However, a problem of discrepancies in numbers of SNPs and genetic distance m...
Saved in:
Main Authors: | , , , , , , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Published: |
2020
|
Subjects: | |
Online Access: | https://repository.li.mahidol.ac.th/handle/123456789/49503 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Mahidol University |
id |
th-mahidol.49503 |
---|---|
record_format |
dspace |
spelling |
th-mahidol.495032020-01-27T10:34:45Z An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data Areeya Disratthakit Licht Toyo-oka Penpitcha Thawong Pundharika Paiboonsiri Nuanjun Wichukjinda Pravech Ajawatanawong Natthakan Thipkrua Krairerk Suthum Prasit Palittapongarnpim Katsushi Tokunaga Surakameth Mahasirimongkol University of Tokyo National Center for Global Health and Medicine Thailand Ministry of Public Health Mahidol University Thailand National Center for Genetic Engineering and Biotechnology Agricultural and Biological Sciences Biochemistry, Genetics and Molecular Biology Immunology and Microbiology Medicine © 2019 Elsevier B.V. Whole-genome sequencing (WGS) data allow for an inference of Mycobacterium tuberculosis (Mtb) clusters by using a pairwise genetic distance of ≤12 single nucleotide polymorphisms (SNPs) as a threshold. However, a problem of discrepancies in numbers of SNPs and genetic distance measurement is a great concern when combining WGS data from different next generation sequencing (NGS) platforms. We performed SNP variant calling on WGS data of 9 multidrug-resistant (MDR-TB), 3 extensively drug-resistant tuberculosis (XDR-TB) and a standard M. tuberculosis strain H37Rv from an Illumina/NextSeq500 and an Ion Torrent PGM. Variant calls were obtained using four different common variant calling tools, including Genome Analysis Toolkit (GATK) HaplotypeCaller (GATK-VCF workflow), GATK HaplotypeCaller and GenotypeGVCFs (GATK-GVCF workflow), SAMtools, and VarScan 2. Cross-platform pairwise SNP differences, minimum spanning networks and average nucleotide identity (ANI) were analysed to measure performance of the variant calling tools. Minimum pairwise SNP differences ranged from 2 to 14 SNPs when using GVCF workflow while maximum pairwise SNP differences ranged from 7 to 158 SNPs when using VarScan 2. ANI comparison between SNPs data from NextSeq500 and PGM of MDR-TB and XDR-TB showed maximum ANI of 99.7% and 99.0%, respectively, with GVCF workflow while the other SNP calling results showed lower ANI in a range of 98.6% to 95.1%. In this study, we suggest that the GVCF workflow showed the best performing variant caller to avoid cross-platform pairwise SNP differences. 2020-01-27T03:28:11Z 2020-01-27T03:28:11Z 2020-04-01 Article Infection, Genetics and Evolution. Vol.79, (2020) 10.1016/j.meegid.2019.104152 15677257 15671348 2-s2.0-85077319732 https://repository.li.mahidol.ac.th/handle/123456789/49503 Mahidol University SCOPUS https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85077319732&origin=inward |
institution |
Mahidol University |
building |
Mahidol University Library |
continent |
Asia |
country |
Thailand Thailand |
content_provider |
Mahidol University Library |
collection |
Mahidol University Institutional Repository |
topic |
Agricultural and Biological Sciences Biochemistry, Genetics and Molecular Biology Immunology and Microbiology Medicine |
spellingShingle |
Agricultural and Biological Sciences Biochemistry, Genetics and Molecular Biology Immunology and Microbiology Medicine Areeya Disratthakit Licht Toyo-oka Penpitcha Thawong Pundharika Paiboonsiri Nuanjun Wichukjinda Pravech Ajawatanawong Natthakan Thipkrua Krairerk Suthum Prasit Palittapongarnpim Katsushi Tokunaga Surakameth Mahasirimongkol An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data |
description |
© 2019 Elsevier B.V. Whole-genome sequencing (WGS) data allow for an inference of Mycobacterium tuberculosis (Mtb) clusters by using a pairwise genetic distance of ≤12 single nucleotide polymorphisms (SNPs) as a threshold. However, a problem of discrepancies in numbers of SNPs and genetic distance measurement is a great concern when combining WGS data from different next generation sequencing (NGS) platforms. We performed SNP variant calling on WGS data of 9 multidrug-resistant (MDR-TB), 3 extensively drug-resistant tuberculosis (XDR-TB) and a standard M. tuberculosis strain H37Rv from an Illumina/NextSeq500 and an Ion Torrent PGM. Variant calls were obtained using four different common variant calling tools, including Genome Analysis Toolkit (GATK) HaplotypeCaller (GATK-VCF workflow), GATK HaplotypeCaller and GenotypeGVCFs (GATK-GVCF workflow), SAMtools, and VarScan 2. Cross-platform pairwise SNP differences, minimum spanning networks and average nucleotide identity (ANI) were analysed to measure performance of the variant calling tools. Minimum pairwise SNP differences ranged from 2 to 14 SNPs when using GVCF workflow while maximum pairwise SNP differences ranged from 7 to 158 SNPs when using VarScan 2. ANI comparison between SNPs data from NextSeq500 and PGM of MDR-TB and XDR-TB showed maximum ANI of 99.7% and 99.0%, respectively, with GVCF workflow while the other SNP calling results showed lower ANI in a range of 98.6% to 95.1%. In this study, we suggest that the GVCF workflow showed the best performing variant caller to avoid cross-platform pairwise SNP differences. |
author2 |
University of Tokyo |
author_facet |
University of Tokyo Areeya Disratthakit Licht Toyo-oka Penpitcha Thawong Pundharika Paiboonsiri Nuanjun Wichukjinda Pravech Ajawatanawong Natthakan Thipkrua Krairerk Suthum Prasit Palittapongarnpim Katsushi Tokunaga Surakameth Mahasirimongkol |
format |
Article |
author |
Areeya Disratthakit Licht Toyo-oka Penpitcha Thawong Pundharika Paiboonsiri Nuanjun Wichukjinda Pravech Ajawatanawong Natthakan Thipkrua Krairerk Suthum Prasit Palittapongarnpim Katsushi Tokunaga Surakameth Mahasirimongkol |
author_sort |
Areeya Disratthakit |
title |
An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data |
title_short |
An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data |
title_full |
An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data |
title_fullStr |
An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data |
title_full_unstemmed |
An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data |
title_sort |
optimized genomic vcf workflow for precise identification of mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data |
publishDate |
2020 |
url |
https://repository.li.mahidol.ac.th/handle/123456789/49503 |
_version_ |
1763490699236343808 |