Multiresolution persistent homology for excessively large biomolecular datasets

Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of inte...

Full description

Saved in:
Bibliographic Details
Main Authors: Xia, Kelin, Zhao, Zhixiong, Wei, Guo-Wei
Other Authors: School of Physical and Mathematical Sciences
Format: Article
Language:English
Published: 2016
Subjects:
Online Access:https://hdl.handle.net/10356/82117
http://hdl.handle.net/10220/41115
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-82117
record_format dspace
spelling sg-ntu-dr.10356-821172023-02-28T19:32:26Z Multiresolution persistent homology for excessively large biomolecular datasets Xia, Kelin Zhao, Zhixiong Wei, Guo-Wei School of Physical and Mathematical Sciences Proteins Multiscale methods Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs. Published version 2016-08-10T05:54:20Z 2019-12-06T14:46:59Z 2016-08-10T05:54:20Z 2019-12-06T14:46:59Z 2015 Journal Article Xia, K., Zhao, Z., & Wei, G.-W. (2015). Multiresolution persistent homology for excessively large biomolecular datasets. The Journal of Chemical Physics, 143(13), 134103-. 0021-9606 https://hdl.handle.net/10356/82117 http://hdl.handle.net/10220/41115 10.1063/1.4931733 26450288 en The Journal of Chemical Physics © 2015 American Institute of Physics. This paper was published in The Journal of Chemical Physics and is made available as an electronic reprint (preprint) with permission of American Institute of Physics. The published version is available at: [http://dx.doi.org/10.1063/1.4931733]. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper is prohibited and is subject to penalties under law. 12 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Proteins
Multiscale methods
spellingShingle Proteins
Multiscale methods
Xia, Kelin
Zhao, Zhixiong
Wei, Guo-Wei
Multiresolution persistent homology for excessively large biomolecular datasets
description Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.
author2 School of Physical and Mathematical Sciences
author_facet School of Physical and Mathematical Sciences
Xia, Kelin
Zhao, Zhixiong
Wei, Guo-Wei
format Article
author Xia, Kelin
Zhao, Zhixiong
Wei, Guo-Wei
author_sort Xia, Kelin
title Multiresolution persistent homology for excessively large biomolecular datasets
title_short Multiresolution persistent homology for excessively large biomolecular datasets
title_full Multiresolution persistent homology for excessively large biomolecular datasets
title_fullStr Multiresolution persistent homology for excessively large biomolecular datasets
title_full_unstemmed Multiresolution persistent homology for excessively large biomolecular datasets
title_sort multiresolution persistent homology for excessively large biomolecular datasets
publishDate 2016
url https://hdl.handle.net/10356/82117
http://hdl.handle.net/10220/41115
_version_ 1759854180322770944