Comparing the performances of glass transition temperatures prediction : SMILES vs. Molfile

Glass transition temperature (Tg) is the temperature at which a polymer changes from rigid to flexible. Tg is an important tool for modifying physical properties of polymers, with a wide variety of industrial applications. The field of machine learning (ML) has significantly grown over the recent...

Full description

Saved in:
Bibliographic Details
Main Author: Goh, Kai Leong
Other Authors: Lu Yunpeng
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/155296
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-155296
record_format dspace
spelling sg-ntu-dr.10356-1552962023-02-28T23:17:59Z Comparing the performances of glass transition temperatures prediction : SMILES vs. Molfile Goh, Kai Leong Lu Yunpeng School of Physical and Mathematical Sciences YPLu@ntu.edu.sg Science::Chemistry Glass transition temperature (Tg) is the temperature at which a polymer changes from rigid to flexible. Tg is an important tool for modifying physical properties of polymers, with a wide variety of industrial applications. The field of machine learning (ML) has significantly grown over the recent years due to advances in technology. In computational chemistry, ML takes the form of quantitative structure–property relationship (QSPR) modelling. The main objective of this project was the comparison between two different types of digital representations of molecular structures regarding their QSPR model performances for the prediction of Tg. A dataset of 1200 polymer data was collected from the PolyInfo polymer database. The Simplified Molecular-Input Line-Entry System (SMILES) and MDL Molfiles (.mol files) were the two digital representations of molecular structures. The two sets of features used were Mordred-2D and ECFP4. XGBoost (Extreme Gradient Boosting) was selected as the regression algorithm, with R2 and RMSE being the scoring metrics to evaluate the model performance. For Mordred-2D, SMILES generally performed better than .mol files. For ECFP4, SMILES and .mol files yielded very similar results. It was noted that the .mol file optimization process was more time-consuming than SMILES strings generation process. Based on the results obtained, it was concluded that using SMILES will be a better choice for future studies in terms of efficiency. The main focus of future work will be to collect more data from the PolyInfo database and to try other machine learning algorithms. Bachelor of Science in Chemistry and Biological Chemistry 2022-02-16T04:48:36Z 2022-02-16T04:48:36Z 2021 Final Year Project (FYP) Goh, K. L. (2021). Comparing the performances of glass transition temperatures prediction : SMILES vs. Molfile. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/155296 https://hdl.handle.net/10356/155296 en CHEM/21/039 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Science::Chemistry
spellingShingle Science::Chemistry
Goh, Kai Leong
Comparing the performances of glass transition temperatures prediction : SMILES vs. Molfile
description Glass transition temperature (Tg) is the temperature at which a polymer changes from rigid to flexible. Tg is an important tool for modifying physical properties of polymers, with a wide variety of industrial applications. The field of machine learning (ML) has significantly grown over the recent years due to advances in technology. In computational chemistry, ML takes the form of quantitative structure–property relationship (QSPR) modelling. The main objective of this project was the comparison between two different types of digital representations of molecular structures regarding their QSPR model performances for the prediction of Tg. A dataset of 1200 polymer data was collected from the PolyInfo polymer database. The Simplified Molecular-Input Line-Entry System (SMILES) and MDL Molfiles (.mol files) were the two digital representations of molecular structures. The two sets of features used were Mordred-2D and ECFP4. XGBoost (Extreme Gradient Boosting) was selected as the regression algorithm, with R2 and RMSE being the scoring metrics to evaluate the model performance. For Mordred-2D, SMILES generally performed better than .mol files. For ECFP4, SMILES and .mol files yielded very similar results. It was noted that the .mol file optimization process was more time-consuming than SMILES strings generation process. Based on the results obtained, it was concluded that using SMILES will be a better choice for future studies in terms of efficiency. The main focus of future work will be to collect more data from the PolyInfo database and to try other machine learning algorithms.
author2 Lu Yunpeng
author_facet Lu Yunpeng
Goh, Kai Leong
format Final Year Project
author Goh, Kai Leong
author_sort Goh, Kai Leong
title Comparing the performances of glass transition temperatures prediction : SMILES vs. Molfile
title_short Comparing the performances of glass transition temperatures prediction : SMILES vs. Molfile
title_full Comparing the performances of glass transition temperatures prediction : SMILES vs. Molfile
title_fullStr Comparing the performances of glass transition temperatures prediction : SMILES vs. Molfile
title_full_unstemmed Comparing the performances of glass transition temperatures prediction : SMILES vs. Molfile
title_sort comparing the performances of glass transition temperatures prediction : smiles vs. molfile
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/155296
_version_ 1759857465977995264