An optimized genomic VCF workflow for precise identification of Mycobacterium tuberculosis cluster from cross-platform whole genome sequencing data
© 2019 Elsevier B.V. Whole-genome sequencing (WGS) data allow for an inference of Mycobacterium tuberculosis (Mtb) clusters by using a pairwise genetic distance of ≤12 single nucleotide polymorphisms (SNPs) as a threshold. However, a problem of discrepancies in numbers of SNPs and genetic distance m...
Saved in:
Main Authors: | , , , , , , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Published: |
2020
|
Subjects: | |
Online Access: | https://repository.li.mahidol.ac.th/handle/123456789/49503 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Mahidol University |
Summary: | © 2019 Elsevier B.V. Whole-genome sequencing (WGS) data allow for an inference of Mycobacterium tuberculosis (Mtb) clusters by using a pairwise genetic distance of ≤12 single nucleotide polymorphisms (SNPs) as a threshold. However, a problem of discrepancies in numbers of SNPs and genetic distance measurement is a great concern when combining WGS data from different next generation sequencing (NGS) platforms. We performed SNP variant calling on WGS data of 9 multidrug-resistant (MDR-TB), 3 extensively drug-resistant tuberculosis (XDR-TB) and a standard M. tuberculosis strain H37Rv from an Illumina/NextSeq500 and an Ion Torrent PGM. Variant calls were obtained using four different common variant calling tools, including Genome Analysis Toolkit (GATK) HaplotypeCaller (GATK-VCF workflow), GATK HaplotypeCaller and GenotypeGVCFs (GATK-GVCF workflow), SAMtools, and VarScan 2. Cross-platform pairwise SNP differences, minimum spanning networks and average nucleotide identity (ANI) were analysed to measure performance of the variant calling tools. Minimum pairwise SNP differences ranged from 2 to 14 SNPs when using GVCF workflow while maximum pairwise SNP differences ranged from 7 to 158 SNPs when using VarScan 2. ANI comparison between SNPs data from NextSeq500 and PGM of MDR-TB and XDR-TB showed maximum ANI of 99.7% and 99.0%, respectively, with GVCF workflow while the other SNP calling results showed lower ANI in a range of 98.6% to 95.1%. In this study, we suggest that the GVCF workflow showed the best performing variant caller to avoid cross-platform pairwise SNP differences. |
---|