Exploring machine learning methods on molecular data

Understanding molecular data can be useful for various fields of science, including biology, chemistry, and material science to name a few. In this report, XRD (X-Ray Diffraction) data and protein ligand binding affinity data is looked at and machine learning techniques a...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Chen Hui
Other Authors: Xia Kelin
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/148528
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Understanding molecular data can be useful for various fields of science, including biology, chemistry, and material science to name a few. In this report, XRD (X-Ray Diffraction) data and protein ligand binding affinity data is looked at and machine learning techniques are used to tackle classification and regression problems. Working with molecular data can be tricky. For machine learning models to work, all input data must be of the same shape. However, each protein-ligand complex consists of varying number and types of elements, which makes it challenging to come up with models that can capture the signals and information encoded in these different atoms, with their coordinates and physical properties.Similarly, it is difficult to fit XRD data into a machine learning model since different datasets can have different array sizes which makes it a challenge to come up with a method to consistently classify these inconsistent data.