Phylogenetic analysis of allotetraploid species using polarized genomic sequences

Phylogenetic analysis of polyploid hybrid species has long posed a formidable challenge as it requires the ability to distinguish between alleles of different ancestral origins in order to disentangle their individual evolutionary history. This problem has been previously addressed by conceiving phy...

Full description

Saved in:
Bibliographic Details
Main Authors: Leal, J. Luis, Milesi, Pascal, Salojärvi, Jarkko, Lascoux, Martin
Other Authors: School of Biological Sciences
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/174304
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-174304
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Medicine, Health and Life Sciences
Allopolyploidy
Genomic polarization
spellingShingle Medicine, Health and Life Sciences
Allopolyploidy
Genomic polarization
Leal, J. Luis
Milesi, Pascal
Salojärvi, Jarkko
Lascoux, Martin
Phylogenetic analysis of allotetraploid species using polarized genomic sequences
description Phylogenetic analysis of polyploid hybrid species has long posed a formidable challenge as it requires the ability to distinguish between alleles of different ancestral origins in order to disentangle their individual evolutionary history. This problem has been previously addressed by conceiving phylogenies as reticulate networks, using a two-step phasing strategy that first identifies and segregates homoeologous loci and then, during a second phasing step, assigns each gene copy to one of the subgenomes of an allopolyploid species. Here, we propose an alternative approach, one that preserves the core idea behind phasing-to produce separate nucleotide sequences that capture the reticulate evolutionary history of a polyploid-while vastly simplifying its implementation by reducing a complex multistage procedure to a single phasing step. While most current methods used for phylogenetic reconstruction of polyploid species require sequencing reads to be pre-phased using experimental or computational methods-usually an expensive, complex, and/or time-consuming endeavor-phasing executed using our algorithm is performed directly on the multiple-sequence alignment (MSA), a key change that allows for the simultaneous segregation and sorting of gene copies. We introduce the concept of genomic polarization that, when applied to an allopolyploid species, produces nucleotide sequences that capture the fraction of a polyploid genome that deviates from that of a reference sequence, usually one of the other species present in the MSA. We show that if the reference sequence is one of the parental species, the polarized polyploid sequence has a close resemblance (high pairwise sequence identity) to the second parental species. This knowledge is harnessed to build a new heuristic algorithm where, by replacing the allopolyploid genomic sequence in the MSA by its polarized version, it is possible to identify the phylogenetic position of the polyploid's ancestral parents in an iterative process. The proposed methodology can be used with long-read and short-read high-throughput sequencing data and requires only one representative individual for each species to be included in the phylogenetic analysis. In its current form, it can be used in the analysis of phylogenies containing tetraploid and diploid species. We test the newly developed method extensively using simulated data in order to evaluate its accuracy. We show empirically that the use of polarized genomic sequences allows for the correct identification of both parental species of an allotetraploid with up to 97% certainty in phylogenies with moderate levels of incomplete lineage sorting (ILS) and 87% in phylogenies containing high levels of ILS. We then apply the polarization protocol to reconstruct the reticulate histories of Arabidopsis kamchatica and Arabidopsis suecica, two allopolyploids whose ancestry has been well documented.
author2 School of Biological Sciences
author_facet School of Biological Sciences
Leal, J. Luis
Milesi, Pascal
Salojärvi, Jarkko
Lascoux, Martin
format Article
author Leal, J. Luis
Milesi, Pascal
Salojärvi, Jarkko
Lascoux, Martin
author_sort Leal, J. Luis
title Phylogenetic analysis of allotetraploid species using polarized genomic sequences
title_short Phylogenetic analysis of allotetraploid species using polarized genomic sequences
title_full Phylogenetic analysis of allotetraploid species using polarized genomic sequences
title_fullStr Phylogenetic analysis of allotetraploid species using polarized genomic sequences
title_full_unstemmed Phylogenetic analysis of allotetraploid species using polarized genomic sequences
title_sort phylogenetic analysis of allotetraploid species using polarized genomic sequences
publishDate 2024
url https://hdl.handle.net/10356/174304
_version_ 1795375047881261056
spelling sg-ntu-dr.10356-1743042024-04-01T15:32:15Z Phylogenetic analysis of allotetraploid species using polarized genomic sequences Leal, J. Luis Milesi, Pascal Salojärvi, Jarkko Lascoux, Martin School of Biological Sciences Medicine, Health and Life Sciences Allopolyploidy Genomic polarization Phylogenetic analysis of polyploid hybrid species has long posed a formidable challenge as it requires the ability to distinguish between alleles of different ancestral origins in order to disentangle their individual evolutionary history. This problem has been previously addressed by conceiving phylogenies as reticulate networks, using a two-step phasing strategy that first identifies and segregates homoeologous loci and then, during a second phasing step, assigns each gene copy to one of the subgenomes of an allopolyploid species. Here, we propose an alternative approach, one that preserves the core idea behind phasing-to produce separate nucleotide sequences that capture the reticulate evolutionary history of a polyploid-while vastly simplifying its implementation by reducing a complex multistage procedure to a single phasing step. While most current methods used for phylogenetic reconstruction of polyploid species require sequencing reads to be pre-phased using experimental or computational methods-usually an expensive, complex, and/or time-consuming endeavor-phasing executed using our algorithm is performed directly on the multiple-sequence alignment (MSA), a key change that allows for the simultaneous segregation and sorting of gene copies. We introduce the concept of genomic polarization that, when applied to an allopolyploid species, produces nucleotide sequences that capture the fraction of a polyploid genome that deviates from that of a reference sequence, usually one of the other species present in the MSA. We show that if the reference sequence is one of the parental species, the polarized polyploid sequence has a close resemblance (high pairwise sequence identity) to the second parental species. This knowledge is harnessed to build a new heuristic algorithm where, by replacing the allopolyploid genomic sequence in the MSA by its polarized version, it is possible to identify the phylogenetic position of the polyploid's ancestral parents in an iterative process. The proposed methodology can be used with long-read and short-read high-throughput sequencing data and requires only one representative individual for each species to be included in the phylogenetic analysis. In its current form, it can be used in the analysis of phylogenies containing tetraploid and diploid species. We test the newly developed method extensively using simulated data in order to evaluate its accuracy. We show empirically that the use of polarized genomic sequences allows for the correct identification of both parental species of an allotetraploid with up to 97% certainty in phylogenies with moderate levels of incomplete lineage sorting (ILS) and 87% in phylogenies containing high levels of ILS. We then apply the polarization protocol to reconstruct the reticulate histories of Arabidopsis kamchatica and Arabidopsis suecica, two allopolyploids whose ancestry has been well documented. Published version This work was supported by the Swedish Research Council for Sustainable Development (FORMAS) (2016-00780 and 2020-01456 to M.L.). 2024-03-26T01:02:00Z 2024-03-26T01:02:00Z 2023 Journal Article Leal, J. L., Milesi, P., Salojärvi, J. & Lascoux, M. (2023). Phylogenetic analysis of allotetraploid species using polarized genomic sequences. Systematic Biology, 72(2), 372-390. https://dx.doi.org/10.1093/sysbio/syad009 1063-5157 https://hdl.handle.net/10356/174304 10.1093/sysbio/syad009 36932679 2-s2.0-85163831019 2 72 372 390 en Systematic Biology © The Author(s) 2023. Published by Oxford University Press on behalf of the Society of Systematic Biologists. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. application/pdf