Phylogenetic analysis of allotetraploid species using polarized genomic sequences
Phylogenetic analysis of polyploid hybrid species has long posed a formidable challenge as it requires the ability to distinguish between alleles of different ancestral origins in order to disentangle their individual evolutionary history. This problem has been previously addressed by conceiving phy...
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/174304 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-174304 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Medicine, Health and Life Sciences Allopolyploidy Genomic polarization |
spellingShingle |
Medicine, Health and Life Sciences Allopolyploidy Genomic polarization Leal, J. Luis Milesi, Pascal Salojärvi, Jarkko Lascoux, Martin Phylogenetic analysis of allotetraploid species using polarized genomic sequences |
description |
Phylogenetic analysis of polyploid hybrid species has long posed a formidable challenge as it requires the ability to distinguish between alleles of different ancestral origins in order to disentangle their individual evolutionary history. This problem has been previously addressed by conceiving phylogenies as reticulate networks, using a two-step phasing strategy that first identifies and segregates homoeologous loci and then, during a second phasing step, assigns each gene copy to one of the subgenomes of an allopolyploid species. Here, we propose an alternative approach, one that preserves the core idea behind phasing-to produce separate nucleotide sequences that capture the reticulate evolutionary history of a polyploid-while vastly simplifying its implementation by reducing a complex multistage procedure to a single phasing step. While most current methods used for phylogenetic reconstruction of polyploid species require sequencing reads to be pre-phased using experimental or computational methods-usually an expensive, complex, and/or time-consuming endeavor-phasing executed using our algorithm is performed directly on the multiple-sequence alignment (MSA), a key change that allows for the simultaneous segregation and sorting of gene copies. We introduce the concept of genomic polarization that, when applied to an allopolyploid species, produces nucleotide sequences that capture the fraction of a polyploid genome that deviates from that of a reference sequence, usually one of the other species present in the MSA. We show that if the reference sequence is one of the parental species, the polarized polyploid sequence has a close resemblance (high pairwise sequence identity) to the second parental species. This knowledge is harnessed to build a new heuristic algorithm where, by replacing the allopolyploid genomic sequence in the MSA by its polarized version, it is possible to identify the phylogenetic position of the polyploid's ancestral parents in an iterative process. The proposed methodology can be used with long-read and short-read high-throughput sequencing data and requires only one representative individual for each species to be included in the phylogenetic analysis. In its current form, it can be used in the analysis of phylogenies containing tetraploid and diploid species. We test the newly developed method extensively using simulated data in order to evaluate its accuracy. We show empirically that the use of polarized genomic sequences allows for the correct identification of both parental species of an allotetraploid with up to 97% certainty in phylogenies with moderate levels of incomplete lineage sorting (ILS) and 87% in phylogenies containing high levels of ILS. We then apply the polarization protocol to reconstruct the reticulate histories of Arabidopsis kamchatica and Arabidopsis suecica, two allopolyploids whose ancestry has been well documented. |
author2 |
School of Biological Sciences |
author_facet |
School of Biological Sciences Leal, J. Luis Milesi, Pascal Salojärvi, Jarkko Lascoux, Martin |
format |
Article |
author |
Leal, J. Luis Milesi, Pascal Salojärvi, Jarkko Lascoux, Martin |
author_sort |
Leal, J. Luis |
title |
Phylogenetic analysis of allotetraploid species using polarized genomic sequences |
title_short |
Phylogenetic analysis of allotetraploid species using polarized genomic sequences |
title_full |
Phylogenetic analysis of allotetraploid species using polarized genomic sequences |
title_fullStr |
Phylogenetic analysis of allotetraploid species using polarized genomic sequences |
title_full_unstemmed |
Phylogenetic analysis of allotetraploid species using polarized genomic sequences |
title_sort |
phylogenetic analysis of allotetraploid species using polarized genomic sequences |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/174304 |
_version_ |
1795375047881261056 |
spelling |
sg-ntu-dr.10356-1743042024-04-01T15:32:15Z Phylogenetic analysis of allotetraploid species using polarized genomic sequences Leal, J. Luis Milesi, Pascal Salojärvi, Jarkko Lascoux, Martin School of Biological Sciences Medicine, Health and Life Sciences Allopolyploidy Genomic polarization Phylogenetic analysis of polyploid hybrid species has long posed a formidable challenge as it requires the ability to distinguish between alleles of different ancestral origins in order to disentangle their individual evolutionary history. This problem has been previously addressed by conceiving phylogenies as reticulate networks, using a two-step phasing strategy that first identifies and segregates homoeologous loci and then, during a second phasing step, assigns each gene copy to one of the subgenomes of an allopolyploid species. Here, we propose an alternative approach, one that preserves the core idea behind phasing-to produce separate nucleotide sequences that capture the reticulate evolutionary history of a polyploid-while vastly simplifying its implementation by reducing a complex multistage procedure to a single phasing step. While most current methods used for phylogenetic reconstruction of polyploid species require sequencing reads to be pre-phased using experimental or computational methods-usually an expensive, complex, and/or time-consuming endeavor-phasing executed using our algorithm is performed directly on the multiple-sequence alignment (MSA), a key change that allows for the simultaneous segregation and sorting of gene copies. We introduce the concept of genomic polarization that, when applied to an allopolyploid species, produces nucleotide sequences that capture the fraction of a polyploid genome that deviates from that of a reference sequence, usually one of the other species present in the MSA. We show that if the reference sequence is one of the parental species, the polarized polyploid sequence has a close resemblance (high pairwise sequence identity) to the second parental species. This knowledge is harnessed to build a new heuristic algorithm where, by replacing the allopolyploid genomic sequence in the MSA by its polarized version, it is possible to identify the phylogenetic position of the polyploid's ancestral parents in an iterative process. The proposed methodology can be used with long-read and short-read high-throughput sequencing data and requires only one representative individual for each species to be included in the phylogenetic analysis. In its current form, it can be used in the analysis of phylogenies containing tetraploid and diploid species. We test the newly developed method extensively using simulated data in order to evaluate its accuracy. We show empirically that the use of polarized genomic sequences allows for the correct identification of both parental species of an allotetraploid with up to 97% certainty in phylogenies with moderate levels of incomplete lineage sorting (ILS) and 87% in phylogenies containing high levels of ILS. We then apply the polarization protocol to reconstruct the reticulate histories of Arabidopsis kamchatica and Arabidopsis suecica, two allopolyploids whose ancestry has been well documented. Published version This work was supported by the Swedish Research Council for Sustainable Development (FORMAS) (2016-00780 and 2020-01456 to M.L.). 2024-03-26T01:02:00Z 2024-03-26T01:02:00Z 2023 Journal Article Leal, J. L., Milesi, P., Salojärvi, J. & Lascoux, M. (2023). Phylogenetic analysis of allotetraploid species using polarized genomic sequences. Systematic Biology, 72(2), 372-390. https://dx.doi.org/10.1093/sysbio/syad009 1063-5157 https://hdl.handle.net/10356/174304 10.1093/sysbio/syad009 36932679 2-s2.0-85163831019 2 72 372 390 en Systematic Biology © The Author(s) 2023. Published by Oxford University Press on behalf of the Society of Systematic Biologists. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. application/pdf |