Imputation performance in Latin American populations: improving rare variants representation with the inclusion of native American genomes

Current Genome-Wide Association Studies (GWAS) rely on genotype imputation to increase statistical power, improve fine-mapping of association signals, and facilitate meta-analyses. Due to the complex demographic history of Latin America and the lack of balanced representation of Native American geno...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiménez-Kaufmann, Andrés, Chong, Amanda Y., Cortés, Adrián, Quinto-Cortés, Consuelo D., Fernandez-Valverde, Selene L., Ferreyra-Reyes, Leticia, Cruz-Hervert, Luis Pablo, Medina-Muñoz, Santiago G., Sohail, Mashaal, Palma-Martinez, María J., Delgado-Sánchez, Gudalupe, Mongua-Rodríguez, Norma, Mentzer, Alexander J., Hill, Adrian V. S., Moreno-Macías, Hortensia, Huerta-Chagoya, Alicia, Aguilar-Salinas, Carlos A., Torres, Michael, Kim, Hie Lim, Kalsi, Namrata, Schuster, Stephan Christoph, Tusié-Luna, Teresa, Del-Vecchyo, Diego Ortega, García-García, Lourdes, Moreno-Estrada, Andrés
Other Authors: School of Biological Sciences
Format: Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/161478
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-161478
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Science::Biological sciences
Imputation
Reference Panels
spellingShingle Science::Biological sciences
Imputation
Reference Panels
Jiménez-Kaufmann, Andrés
Chong, Amanda Y.
Cortés, Adrián
Quinto-Cortés, Consuelo D.
Fernandez-Valverde, Selene L.
Ferreyra-Reyes, Leticia
Cruz-Hervert, Luis Pablo
Medina-Muñoz, Santiago G.
Sohail, Mashaal
Palma-Martinez, María J.
Delgado-Sánchez, Gudalupe
Mongua-Rodríguez, Norma
Mentzer, Alexander J.
Hill, Adrian V. S.
Moreno-Macías, Hortensia
Huerta-Chagoya, Alicia
Aguilar-Salinas, Carlos A.
Torres, Michael
Kim, Hie Lim
Kalsi, Namrata
Schuster, Stephan Christoph
Tusié-Luna, Teresa
Del-Vecchyo, Diego Ortega
García-García, Lourdes
Moreno-Estrada, Andrés
Imputation performance in Latin American populations: improving rare variants representation with the inclusion of native American genomes
description Current Genome-Wide Association Studies (GWAS) rely on genotype imputation to increase statistical power, improve fine-mapping of association signals, and facilitate meta-analyses. Due to the complex demographic history of Latin America and the lack of balanced representation of Native American genomes in current imputation panels, the discovery of locally relevant disease variants is likely to be missed, limiting the scope and impact of biomedical research in these populations. Therefore, the necessity of better diversity representation in genomic databases is a scientific imperative. Here, we expand the 1,000 Genomes reference panel (1KGP) with 134 Native American genomes (1KGP + NAT) to assess imputation performance in Latin American individuals of mixed ancestry. Our panel increased the number of SNPs above the GWAS quality threshold, thus improving statistical power for association studies in the region. It also increased imputation accuracy, particularly in low-frequency variants segregating in Native American ancestry tracts. The improvement is subtle but consistent across countries and proportional to the number of genomes added from local source populations. To project the potential improvement with a higher number of reference genomes, we performed simulations and found that at least 3,000 Native American genomes are needed to equal the imputation performance of variants in European ancestry tracts. This reflects the concerning imbalance of diversity in current references and highlights the contribution of our work to reducing it while complementing efforts to improve global equity in genomic research.
author2 School of Biological Sciences
author_facet School of Biological Sciences
Jiménez-Kaufmann, Andrés
Chong, Amanda Y.
Cortés, Adrián
Quinto-Cortés, Consuelo D.
Fernandez-Valverde, Selene L.
Ferreyra-Reyes, Leticia
Cruz-Hervert, Luis Pablo
Medina-Muñoz, Santiago G.
Sohail, Mashaal
Palma-Martinez, María J.
Delgado-Sánchez, Gudalupe
Mongua-Rodríguez, Norma
Mentzer, Alexander J.
Hill, Adrian V. S.
Moreno-Macías, Hortensia
Huerta-Chagoya, Alicia
Aguilar-Salinas, Carlos A.
Torres, Michael
Kim, Hie Lim
Kalsi, Namrata
Schuster, Stephan Christoph
Tusié-Luna, Teresa
Del-Vecchyo, Diego Ortega
García-García, Lourdes
Moreno-Estrada, Andrés
format Article
author Jiménez-Kaufmann, Andrés
Chong, Amanda Y.
Cortés, Adrián
Quinto-Cortés, Consuelo D.
Fernandez-Valverde, Selene L.
Ferreyra-Reyes, Leticia
Cruz-Hervert, Luis Pablo
Medina-Muñoz, Santiago G.
Sohail, Mashaal
Palma-Martinez, María J.
Delgado-Sánchez, Gudalupe
Mongua-Rodríguez, Norma
Mentzer, Alexander J.
Hill, Adrian V. S.
Moreno-Macías, Hortensia
Huerta-Chagoya, Alicia
Aguilar-Salinas, Carlos A.
Torres, Michael
Kim, Hie Lim
Kalsi, Namrata
Schuster, Stephan Christoph
Tusié-Luna, Teresa
Del-Vecchyo, Diego Ortega
García-García, Lourdes
Moreno-Estrada, Andrés
author_sort Jiménez-Kaufmann, Andrés
title Imputation performance in Latin American populations: improving rare variants representation with the inclusion of native American genomes
title_short Imputation performance in Latin American populations: improving rare variants representation with the inclusion of native American genomes
title_full Imputation performance in Latin American populations: improving rare variants representation with the inclusion of native American genomes
title_fullStr Imputation performance in Latin American populations: improving rare variants representation with the inclusion of native American genomes
title_full_unstemmed Imputation performance in Latin American populations: improving rare variants representation with the inclusion of native American genomes
title_sort imputation performance in latin american populations: improving rare variants representation with the inclusion of native american genomes
publishDate 2022
url https://hdl.handle.net/10356/161478
_version_ 1744365389889929216
spelling sg-ntu-dr.10356-1614782022-09-10T23:31:40Z Imputation performance in Latin American populations: improving rare variants representation with the inclusion of native American genomes Jiménez-Kaufmann, Andrés Chong, Amanda Y. Cortés, Adrián Quinto-Cortés, Consuelo D. Fernandez-Valverde, Selene L. Ferreyra-Reyes, Leticia Cruz-Hervert, Luis Pablo Medina-Muñoz, Santiago G. Sohail, Mashaal Palma-Martinez, María J. Delgado-Sánchez, Gudalupe Mongua-Rodríguez, Norma Mentzer, Alexander J. Hill, Adrian V. S. Moreno-Macías, Hortensia Huerta-Chagoya, Alicia Aguilar-Salinas, Carlos A. Torres, Michael Kim, Hie Lim Kalsi, Namrata Schuster, Stephan Christoph Tusié-Luna, Teresa Del-Vecchyo, Diego Ortega García-García, Lourdes Moreno-Estrada, Andrés School of Biological Sciences GenomeAsia 100K (GA100K) Consortium, Singapore Singapore Centre for Environmental Life Sciences and Engineering (SCELSE) Science::Biological sciences Imputation Reference Panels Current Genome-Wide Association Studies (GWAS) rely on genotype imputation to increase statistical power, improve fine-mapping of association signals, and facilitate meta-analyses. Due to the complex demographic history of Latin America and the lack of balanced representation of Native American genomes in current imputation panels, the discovery of locally relevant disease variants is likely to be missed, limiting the scope and impact of biomedical research in these populations. Therefore, the necessity of better diversity representation in genomic databases is a scientific imperative. Here, we expand the 1,000 Genomes reference panel (1KGP) with 134 Native American genomes (1KGP + NAT) to assess imputation performance in Latin American individuals of mixed ancestry. Our panel increased the number of SNPs above the GWAS quality threshold, thus improving statistical power for association studies in the region. It also increased imputation accuracy, particularly in low-frequency variants segregating in Native American ancestry tracts. The improvement is subtle but consistent across countries and proportional to the number of genomes added from local source populations. To project the potential improvement with a higher number of reference genomes, we performed simulations and found that at least 3,000 Native American genomes are needed to equal the imputation performance of variants in European ancestry tracts. This reflects the concerning imbalance of diversity in current references and highlights the contribution of our work to reducing it while complementing efforts to improve global equity in genomic research. Published version This work was supported by “The Mexican Biobank Project: Building Capacity for Big Data Science in Medical Genomics in Admixed Populations”, a binational initiative between Mexico and the UK co-funded by CONACYT (Grant number FONCICYT/50/ 2016), and The Newton Fund through The Medical Research Council (Grant number MR/N028937/1) awarded to AME and AVSH. It was also supported by the International Center for Genetic Engineering and Biotechnology (ICGEB, Italy) grant number CRP/MEX20-01. MS was partially supported by the Chicago Fellows program of the University of Chicago. DODV is supported by the UC MEXUS CONACYT collaborative program (Grant number CN-19-29), and the UNAM PAPIIT funding program (Grant number IA200620). 2022-09-05T06:55:16Z 2022-09-05T06:55:16Z 2022 Journal Article Jiménez-Kaufmann, A., Chong, A. Y., Cortés, A., Quinto-Cortés, C. D., Fernandez-Valverde, S. L., Ferreyra-Reyes, L., Cruz-Hervert, L. P., Medina-Muñoz, S. G., Sohail, M., Palma-Martinez, M. J., Delgado-Sánchez, G., Mongua-Rodríguez, N., Mentzer, A. J., Hill, A. V. S., Moreno-Macías, H., Huerta-Chagoya, A., Aguilar-Salinas, C. A., Torres, M., Kim, H. L., ...Moreno-Estrada, A. (2022). Imputation performance in Latin American populations: improving rare variants representation with the inclusion of native American genomes. Frontiers in Genetics, 12, 719791-. https://dx.doi.org/10.3389/fgene.2021.719791 1664-8021 https://hdl.handle.net/10356/161478 10.3389/fgene.2021.719791 35046991 2-s2.0-85123123221 12 719791 en Frontiers in Genetics © 2022 Jiménez-Kaufmann, Chong, Cortés, Quinto-Cortés, FernandezValverde, Ferreyra-Reyes, Cruz-Hervert, Medina-Muñoz, Sohail, Palma-Martinez, Delgado-Sánchez, Mongua-Rodríguez, Mentzer, Hill, Moreno-Macías, HuertaChagoya, Aguilar-Salinas, Torres, Kim, Kalsi, Schuster, Tusié-Luna, DelVecchyo, García-García and Moreno-Estrada. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. application/pdf