Hodge theory and its applications to molecular data analysis

The complete DNA sequencing of the entire human genome, or better known as the Human Genome Project, concluded in 2003. Ever since then, many headways have been made to better understand the organization of the human genome. There are many levels of organizations of the genome, its most basic unit,...

Full description

Saved in:
Bibliographic Details
Main Author: Koh, Ronald Joon Wei
Other Authors: Xia Kelin
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/159024
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-159024
record_format dspace
spelling sg-ntu-dr.10356-1590242023-03-01T00:01:34Z Hodge theory and its applications to molecular data analysis Koh, Ronald Joon Wei Xia Kelin School of Physical and Mathematical Sciences xiakelin@ntu.edu.sg Science::Mathematics The complete DNA sequencing of the entire human genome, or better known as the Human Genome Project, concluded in 2003. Ever since then, many headways have been made to better understand the organization of the human genome. There are many levels of organizations of the genome, its most basic unit, the nucleosome, which consists of DNA wrapped around histone proteins. Some of the higher levels of organization include tetranucleosomes, which consists of several nucleosomes, and topological associating domains (TADs), which are regions in the genome that self-interact more frequently with themselves compared to outside the TAD. In particular, TADs have been relatively new to the scene; there are currently no experimentally validated TADs. Furthermore, disruption of TAD boundaries are associated with several diseases like cancer. Nevertheless, there are currently many methods to call TADs, all of which are not based on a rigorous topological mathematical model. The HodgeRank algorithm, based on the Hodge decomposition theorem, gives us an avenue to quantify these self-interactions. The HodgeRank algorithm was previously used to rank imcomplete or imbalanced data from several e-commerce sites and movies from Netflix. We first show that HodgeRank can also be used to successfully quantify the ``curvedness'' of different biomolecules, such as modelling the protein folding process and comparing biomolecules of different scales and complexities. We then turn our attention back to Hi-C data, which encompasses TADs/TAD regions. We show that under a suitable metric, HodgeRank can be used to quantify the self-interactions within each TAD region of Chromosome 10, each of these regions generated by an existing TAD calling method. Solar power, a renewable source of energy, plays a significant role in allowing us to reduce our dependence on fossil fuels. Solar cells allow us to harness solar power by converting light energy from the Sun to electrical power through the photovoltaic effect, where light energy excites an electron, allowing it to reach a higher energy state. One of the types of materials that are used to make these solar cells are hybrid organic-inorganic perovskites (HOIPs). Not only have HOIPs been projected to be one of the most cost-effective options for future solar cells, its efficiency levels have risen from 5% to 25% in the last decade. Current machine learning-based perovskite designs rely heavily on the prediction of the bandgap of HOIPs. We show that the PerSpect-ML model, which is based on the generation of machine learning features using the eigenvalues of the Hodge Laplacian matrices, and previously applied to protein-ligand binding affinity prediction to great success, can be applied to the prediction of the bandgap of a comprehensive data set of hybrid organic-inorganic perovskites. We show that the resulting machine learning model not only significantly reduces the computational costs of current models, but also is superior in terms of overall predictive ability. Doctor of Philosophy 2022-06-05T11:51:51Z 2022-06-05T11:51:51Z 2021 Thesis-Doctor of Philosophy Koh, R. J. W. (2021). Hodge theory and its applications to molecular data analysis. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/159024 https://hdl.handle.net/10356/159024 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Science::Mathematics
spellingShingle Science::Mathematics
Koh, Ronald Joon Wei
Hodge theory and its applications to molecular data analysis
description The complete DNA sequencing of the entire human genome, or better known as the Human Genome Project, concluded in 2003. Ever since then, many headways have been made to better understand the organization of the human genome. There are many levels of organizations of the genome, its most basic unit, the nucleosome, which consists of DNA wrapped around histone proteins. Some of the higher levels of organization include tetranucleosomes, which consists of several nucleosomes, and topological associating domains (TADs), which are regions in the genome that self-interact more frequently with themselves compared to outside the TAD. In particular, TADs have been relatively new to the scene; there are currently no experimentally validated TADs. Furthermore, disruption of TAD boundaries are associated with several diseases like cancer. Nevertheless, there are currently many methods to call TADs, all of which are not based on a rigorous topological mathematical model. The HodgeRank algorithm, based on the Hodge decomposition theorem, gives us an avenue to quantify these self-interactions. The HodgeRank algorithm was previously used to rank imcomplete or imbalanced data from several e-commerce sites and movies from Netflix. We first show that HodgeRank can also be used to successfully quantify the ``curvedness'' of different biomolecules, such as modelling the protein folding process and comparing biomolecules of different scales and complexities. We then turn our attention back to Hi-C data, which encompasses TADs/TAD regions. We show that under a suitable metric, HodgeRank can be used to quantify the self-interactions within each TAD region of Chromosome 10, each of these regions generated by an existing TAD calling method. Solar power, a renewable source of energy, plays a significant role in allowing us to reduce our dependence on fossil fuels. Solar cells allow us to harness solar power by converting light energy from the Sun to electrical power through the photovoltaic effect, where light energy excites an electron, allowing it to reach a higher energy state. One of the types of materials that are used to make these solar cells are hybrid organic-inorganic perovskites (HOIPs). Not only have HOIPs been projected to be one of the most cost-effective options for future solar cells, its efficiency levels have risen from 5% to 25% in the last decade. Current machine learning-based perovskite designs rely heavily on the prediction of the bandgap of HOIPs. We show that the PerSpect-ML model, which is based on the generation of machine learning features using the eigenvalues of the Hodge Laplacian matrices, and previously applied to protein-ligand binding affinity prediction to great success, can be applied to the prediction of the bandgap of a comprehensive data set of hybrid organic-inorganic perovskites. We show that the resulting machine learning model not only significantly reduces the computational costs of current models, but also is superior in terms of overall predictive ability.
author2 Xia Kelin
author_facet Xia Kelin
Koh, Ronald Joon Wei
format Thesis-Doctor of Philosophy
author Koh, Ronald Joon Wei
author_sort Koh, Ronald Joon Wei
title Hodge theory and its applications to molecular data analysis
title_short Hodge theory and its applications to molecular data analysis
title_full Hodge theory and its applications to molecular data analysis
title_fullStr Hodge theory and its applications to molecular data analysis
title_full_unstemmed Hodge theory and its applications to molecular data analysis
title_sort hodge theory and its applications to molecular data analysis
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/159024
_version_ 1759858245632000000