Comparing the performances of glass transition temperatures prediction : SMILES vs. Molfile
Glass transition temperature (Tg) is the temperature at which a polymer changes from rigid to flexible. Tg is an important tool for modifying physical properties of polymers, with a wide variety of industrial applications. The field of machine learning (ML) has significantly grown over the recent...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/155296 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Glass transition temperature (Tg) is the temperature at which a polymer changes from rigid to
flexible. Tg is an important tool for modifying physical properties of polymers, with a wide variety
of industrial applications. The field of machine learning (ML) has significantly grown over the
recent years due to advances in technology. In computational chemistry, ML takes the form of
quantitative structure–property relationship (QSPR) modelling. The main objective of this project
was the comparison between two different types of digital representations of molecular structures
regarding their QSPR model performances for the prediction of Tg. A dataset of 1200 polymer data
was collected from the PolyInfo polymer database. The Simplified Molecular-Input Line-Entry
System (SMILES) and MDL Molfiles (.mol files) were the two digital representations of molecular
structures. The two sets of features used were Mordred-2D and ECFP4. XGBoost (Extreme
Gradient Boosting) was selected as the regression algorithm, with R2 and RMSE being the scoring
metrics to evaluate the model performance. For Mordred-2D, SMILES generally performed better
than .mol files. For ECFP4, SMILES and .mol files yielded very similar results. It was noted that
the .mol file optimization process was more time-consuming than SMILES strings generation
process. Based on the results obtained, it was concluded that using SMILES will be a better choice
for future studies in terms of efficiency. The main focus of future work will be to collect more data
from the PolyInfo database and to try other machine learning algorithms. |
---|