An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data

© 2019 Elsevier B.V. Whole-genome sequencing (WGS) data allow for an inference of Mycobacterium tuberculosis (Mtb) clusters by using a pairwise genetic distance of ≤12 single nucleotide polymorphisms (SNPs) as a threshold. However, a problem of discrepancies in numbers of SNPs and genetic distance m...

Full description

Saved in:
Bibliographic Details
Main Authors: Areeya Disratthakit, Licht Toyo-oka, Penpitcha Thawong, Pundharika Paiboonsiri, Nuanjun Wichukjinda, Pravech Ajawatanawong, Natthakan Thipkrua, Krairerk Suthum, Prasit Palittapongarnpim, Katsushi Tokunaga, Surakameth Mahasirimongkol
Other Authors: University of Tokyo
Format: Article
Published: 2020
Subjects:
Online Access:https://repository.li.mahidol.ac.th/handle/123456789/49503
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Mahidol University
id th-mahidol.49503
record_format dspace
spelling th-mahidol.495032020-01-27T10:34:45Z An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data Areeya Disratthakit Licht Toyo-oka Penpitcha Thawong Pundharika Paiboonsiri Nuanjun Wichukjinda Pravech Ajawatanawong Natthakan Thipkrua Krairerk Suthum Prasit Palittapongarnpim Katsushi Tokunaga Surakameth Mahasirimongkol University of Tokyo National Center for Global Health and Medicine Thailand Ministry of Public Health Mahidol University Thailand National Center for Genetic Engineering and Biotechnology Agricultural and Biological Sciences Biochemistry, Genetics and Molecular Biology Immunology and Microbiology Medicine © 2019 Elsevier B.V. Whole-genome sequencing (WGS) data allow for an inference of Mycobacterium tuberculosis (Mtb) clusters by using a pairwise genetic distance of ≤12 single nucleotide polymorphisms (SNPs) as a threshold. However, a problem of discrepancies in numbers of SNPs and genetic distance measurement is a great concern when combining WGS data from different next generation sequencing (NGS) platforms. We performed SNP variant calling on WGS data of 9 multidrug-resistant (MDR-TB), 3 extensively drug-resistant tuberculosis (XDR-TB) and a standard M. tuberculosis strain H37Rv from an Illumina/NextSeq500 and an Ion Torrent PGM. Variant calls were obtained using four different common variant calling tools, including Genome Analysis Toolkit (GATK) HaplotypeCaller (GATK-VCF workflow), GATK HaplotypeCaller and GenotypeGVCFs (GATK-GVCF workflow), SAMtools, and VarScan 2. Cross-platform pairwise SNP differences, minimum spanning networks and average nucleotide identity (ANI) were analysed to measure performance of the variant calling tools. Minimum pairwise SNP differences ranged from 2 to 14 SNPs when using GVCF workflow while maximum pairwise SNP differences ranged from 7 to 158 SNPs when using VarScan 2. ANI comparison between SNPs data from NextSeq500 and PGM of MDR-TB and XDR-TB showed maximum ANI of 99.7% and 99.0%, respectively, with GVCF workflow while the other SNP calling results showed lower ANI in a range of 98.6% to 95.1%. In this study, we suggest that the GVCF workflow showed the best performing variant caller to avoid cross-platform pairwise SNP differences. 2020-01-27T03:28:11Z 2020-01-27T03:28:11Z 2020-04-01 Article Infection, Genetics and Evolution. Vol.79, (2020) 10.1016/j.meegid.2019.104152 15677257 15671348 2-s2.0-85077319732 https://repository.li.mahidol.ac.th/handle/123456789/49503 Mahidol University SCOPUS https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85077319732&origin=inward
institution Mahidol University
building Mahidol University Library
continent Asia
country Thailand
Thailand
content_provider Mahidol University Library
collection Mahidol University Institutional Repository
topic Agricultural and Biological Sciences
Biochemistry, Genetics and Molecular Biology
Immunology and Microbiology
Medicine
spellingShingle Agricultural and Biological Sciences
Biochemistry, Genetics and Molecular Biology
Immunology and Microbiology
Medicine
Areeya Disratthakit
Licht Toyo-oka
Penpitcha Thawong
Pundharika Paiboonsiri
Nuanjun Wichukjinda
Pravech Ajawatanawong
Natthakan Thipkrua
Krairerk Suthum
Prasit Palittapongarnpim
Katsushi Tokunaga
Surakameth Mahasirimongkol
An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data
description © 2019 Elsevier B.V. Whole-genome sequencing (WGS) data allow for an inference of Mycobacterium tuberculosis (Mtb) clusters by using a pairwise genetic distance of ≤12 single nucleotide polymorphisms (SNPs) as a threshold. However, a problem of discrepancies in numbers of SNPs and genetic distance measurement is a great concern when combining WGS data from different next generation sequencing (NGS) platforms. We performed SNP variant calling on WGS data of 9 multidrug-resistant (MDR-TB), 3 extensively drug-resistant tuberculosis (XDR-TB) and a standard M. tuberculosis strain H37Rv from an Illumina/NextSeq500 and an Ion Torrent PGM. Variant calls were obtained using four different common variant calling tools, including Genome Analysis Toolkit (GATK) HaplotypeCaller (GATK-VCF workflow), GATK HaplotypeCaller and GenotypeGVCFs (GATK-GVCF workflow), SAMtools, and VarScan 2. Cross-platform pairwise SNP differences, minimum spanning networks and average nucleotide identity (ANI) were analysed to measure performance of the variant calling tools. Minimum pairwise SNP differences ranged from 2 to 14 SNPs when using GVCF workflow while maximum pairwise SNP differences ranged from 7 to 158 SNPs when using VarScan 2. ANI comparison between SNPs data from NextSeq500 and PGM of MDR-TB and XDR-TB showed maximum ANI of 99.7% and 99.0%, respectively, with GVCF workflow while the other SNP calling results showed lower ANI in a range of 98.6% to 95.1%. In this study, we suggest that the GVCF workflow showed the best performing variant caller to avoid cross-platform pairwise SNP differences.
author2 University of Tokyo
author_facet University of Tokyo
Areeya Disratthakit
Licht Toyo-oka
Penpitcha Thawong
Pundharika Paiboonsiri
Nuanjun Wichukjinda
Pravech Ajawatanawong
Natthakan Thipkrua
Krairerk Suthum
Prasit Palittapongarnpim
Katsushi Tokunaga
Surakameth Mahasirimongkol
format Article
author Areeya Disratthakit
Licht Toyo-oka
Penpitcha Thawong
Pundharika Paiboonsiri
Nuanjun Wichukjinda
Pravech Ajawatanawong
Natthakan Thipkrua
Krairerk Suthum
Prasit Palittapongarnpim
Katsushi Tokunaga
Surakameth Mahasirimongkol
author_sort Areeya Disratthakit
title An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data
title_short An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data
title_full An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data
title_fullStr An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data
title_full_unstemmed An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data
title_sort optimized genomic vcf workflow for precise identification of mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data
publishDate 2020
url https://repository.li.mahidol.ac.th/handle/123456789/49503
_version_ 1763490699236343808