Geometric and topological AI for molecular sciences

Data-driven sciences are widely regarded as the fourth paradigm of sciences that will fundamentally change the society and our daily lives. Indeed, artificial intelligence (AI) models have already revolutionized and transformed various data-intensive industries. Machine learning (ML) and deep learni...

Full description

Saved in:
Bibliographic Details
Main Author: Wee, Junjie
Other Authors: Xia Kelin
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165903
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-165903
record_format dspace
spelling sg-ntu-dr.10356-1659032023-05-02T06:33:01Z Geometric and topological AI for molecular sciences Wee, Junjie Xia Kelin School of Physical and Mathematical Sciences xiakelin@ntu.edu.sg Science::Mathematics::Geometry Science::Mathematics::Topology Data-driven sciences are widely regarded as the fourth paradigm of sciences that will fundamentally change the society and our daily lives. Indeed, artificial intelligence (AI) models have already revolutionized and transformed various data-intensive industries. Machine learning (ML) and deep learning models have achieved unprecedented extraordinary performance in image, text, audio, video, and network data analysis. This is largely due to the rise in three major advancements, i.e., accumulation of big data, rise in computational power, and design of highly efficient algorithms. In particular, AlphaFold2 made a remarkable achievement for protein-folding problems which heralds a new era for AI-based molecular data analysis for materials, chemistry, and biology. With excitement and opportunities, AI for molecular sciences also comes with challenges. In this dissertation, we will tackle one of the main challenges in AI for molecular sciences which is constructing or designing effective molecular descriptors and fingerprints. Ideally, effective molecular descriptors should preserve the utmost important features while still possessing the ability to capture the intrinsic molecular properties and information that directly dictate molecular functions. In this way, they can be better “understood” by ML models. This has inspired various researchers to apply topological data analysis (TDA) where persistent homology (PH) and its intrinsic topological invariants act as an excellent and robust molecular featurization method to capture and characterize the underlying topological information in biomolecular systems. By extending beyond the capabilities of TDA, we propose geometric and topological AI for molecular sciences. In this dissertation, two novel persistent functions, namely persistent Ricci curvature (PRC) and persistent Dirac operators are developed as new advanced mathematics-based molecular featurization which can build advanced mathematics-based ML models to perform unsupervised and supervised learning in molecular sciences. In biological data, we built Ollivier persistent Ricci curvature and Forman persistent Ricci curvature-based ML models to predict protein-ligand binding affinity values. Also, we constructed a persistent spectral-based ensemble learning model (PerSpect-EL) to capture and characterize protein-protein interactions upon mutational change. Our PerSpect-EL model has outperformed several existing traditional molecular descriptor-based models in protein-protein binding affinity change predictions. In materials data, we designed both PH-based and persistent Forman curvature (PFC)-based ML models to characterize organic-inorganic halide perovskites (OIHPs). Essentially, our PH-based and PFC-based molecular features produced strong discriminating power in classifying 9 types of OIHPs. Both models have also outperformed traditional perovskite descriptors in materials property predictions such as bandgap, dielectric constant, and refractive index. Doctor of Philosophy 2023-04-13T05:27:55Z 2023-04-13T05:27:55Z 2023 Thesis-Doctor of Philosophy Wee, J. (2023). Geometric and topological AI for molecular sciences. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165903 https://hdl.handle.net/10356/165903 10.32657/10356/165903 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Science::Mathematics::Geometry
Science::Mathematics::Topology
spellingShingle Science::Mathematics::Geometry
Science::Mathematics::Topology
Wee, Junjie
Geometric and topological AI for molecular sciences
description Data-driven sciences are widely regarded as the fourth paradigm of sciences that will fundamentally change the society and our daily lives. Indeed, artificial intelligence (AI) models have already revolutionized and transformed various data-intensive industries. Machine learning (ML) and deep learning models have achieved unprecedented extraordinary performance in image, text, audio, video, and network data analysis. This is largely due to the rise in three major advancements, i.e., accumulation of big data, rise in computational power, and design of highly efficient algorithms. In particular, AlphaFold2 made a remarkable achievement for protein-folding problems which heralds a new era for AI-based molecular data analysis for materials, chemistry, and biology. With excitement and opportunities, AI for molecular sciences also comes with challenges. In this dissertation, we will tackle one of the main challenges in AI for molecular sciences which is constructing or designing effective molecular descriptors and fingerprints. Ideally, effective molecular descriptors should preserve the utmost important features while still possessing the ability to capture the intrinsic molecular properties and information that directly dictate molecular functions. In this way, they can be better “understood” by ML models. This has inspired various researchers to apply topological data analysis (TDA) where persistent homology (PH) and its intrinsic topological invariants act as an excellent and robust molecular featurization method to capture and characterize the underlying topological information in biomolecular systems. By extending beyond the capabilities of TDA, we propose geometric and topological AI for molecular sciences. In this dissertation, two novel persistent functions, namely persistent Ricci curvature (PRC) and persistent Dirac operators are developed as new advanced mathematics-based molecular featurization which can build advanced mathematics-based ML models to perform unsupervised and supervised learning in molecular sciences. In biological data, we built Ollivier persistent Ricci curvature and Forman persistent Ricci curvature-based ML models to predict protein-ligand binding affinity values. Also, we constructed a persistent spectral-based ensemble learning model (PerSpect-EL) to capture and characterize protein-protein interactions upon mutational change. Our PerSpect-EL model has outperformed several existing traditional molecular descriptor-based models in protein-protein binding affinity change predictions. In materials data, we designed both PH-based and persistent Forman curvature (PFC)-based ML models to characterize organic-inorganic halide perovskites (OIHPs). Essentially, our PH-based and PFC-based molecular features produced strong discriminating power in classifying 9 types of OIHPs. Both models have also outperformed traditional perovskite descriptors in materials property predictions such as bandgap, dielectric constant, and refractive index.
author2 Xia Kelin
author_facet Xia Kelin
Wee, Junjie
format Thesis-Doctor of Philosophy
author Wee, Junjie
author_sort Wee, Junjie
title Geometric and topological AI for molecular sciences
title_short Geometric and topological AI for molecular sciences
title_full Geometric and topological AI for molecular sciences
title_fullStr Geometric and topological AI for molecular sciences
title_full_unstemmed Geometric and topological AI for molecular sciences
title_sort geometric and topological ai for molecular sciences
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/165903
_version_ 1765213832184070144