Accurate machine learning models based on small dataset of energetic materials through spatial matrix featurization methods

A large database is desired for machine learning (ML) technology to make accurate predictions of materials physicochemical properties based on their molecular structure. When a large database is not available, the development of proper featurization method based on physicochemical nature of target p...

Full description

Saved in:
Bibliographic Details
Main Authors: Chen, Chao, Liu, Danyang, Deng, Siyan, Zhong, Lixiang, Chan, Serene Hay Yee, Li, Shuzhou, Hng, Huey Hoon
Other Authors: School of Materials Science and Engineering
Format: Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/159986
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-159986
record_format dspace
spelling sg-ntu-dr.10356-1599862022-07-07T03:14:53Z Accurate machine learning models based on small dataset of energetic materials through spatial matrix featurization methods Chen, Chao Liu, Danyang Deng, Siyan Zhong, Lixiang Chan, Serene Hay Yee Li, Shuzhou Hng, Huey Hoon School of Materials Science and Engineering Engineering::Materials Small Database Machine Learning Energetic Materials Screening A large database is desired for machine learning (ML) technology to make accurate predictions of materials physicochemical properties based on their molecular structure. When a large database is not available, the development of proper featurization method based on physicochemical nature of target proprieties can improve the predictive power of ML models with a smaller database. In this work, we show that two new featurization methods, volume occupation spatial matrix and heat contribution spatial matrix, can improve the accuracy in predicting energetic materials’ crystal density (ρcrystal) and solid phase enthalpy of formation (Hf,solid) using a database containing 451 energetic molecules. Their mean absolute errors are reduced from 0.048 g/cm3 and 24.67 kcal/mol to 0.035 g/cm3 and 9.66 kcal/mol, respectively. By leave-one-out-cross-validation, the newly developed ML models can be used to determine the performance of most kinds of energetic materials except cubanes. Our ML models are applied to predict ρcrystal and Hf,solid of CHON-based molecules of the 150 million sized PubChem database, and screened out 56 candidates with competitive detonation performance and reasonable chemical structures. With further improvement in future, spatial matrices have the potential of becoming multifunctional ML simulation tools that could provide even better predictions in wider fields of materials science. Ministry of Education (MOE) S.L. acknowledges support from the Ministry of Education (MOE) Singapore Tier 1 (RG8/20). 2022-07-07T03:14:53Z 2022-07-07T03:14:53Z 2021 Journal Article Chen, C., Liu, D., Deng, S., Zhong, L., Chan, S. H. Y., Li, S. & Hng, H. H. (2021). Accurate machine learning models based on small dataset of energetic materials through spatial matrix featurization methods. Journal of Energy Chemistry, 63, 364-375. https://dx.doi.org/10.1016/j.jechem.2021.08.031 2095-4956 https://hdl.handle.net/10356/159986 10.1016/j.jechem.2021.08.031 2-s2.0-85114690425 63 364 375 en RG8/20 Journal of Energy Chemistry © 2021 Science Press and Dalian Institute of Chemical Physics, Chinese Academy of Sciences. Published by ELSEVIER B.V. and Science Press. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Materials
Small Database Machine Learning
Energetic Materials Screening
spellingShingle Engineering::Materials
Small Database Machine Learning
Energetic Materials Screening
Chen, Chao
Liu, Danyang
Deng, Siyan
Zhong, Lixiang
Chan, Serene Hay Yee
Li, Shuzhou
Hng, Huey Hoon
Accurate machine learning models based on small dataset of energetic materials through spatial matrix featurization methods
description A large database is desired for machine learning (ML) technology to make accurate predictions of materials physicochemical properties based on their molecular structure. When a large database is not available, the development of proper featurization method based on physicochemical nature of target proprieties can improve the predictive power of ML models with a smaller database. In this work, we show that two new featurization methods, volume occupation spatial matrix and heat contribution spatial matrix, can improve the accuracy in predicting energetic materials’ crystal density (ρcrystal) and solid phase enthalpy of formation (Hf,solid) using a database containing 451 energetic molecules. Their mean absolute errors are reduced from 0.048 g/cm3 and 24.67 kcal/mol to 0.035 g/cm3 and 9.66 kcal/mol, respectively. By leave-one-out-cross-validation, the newly developed ML models can be used to determine the performance of most kinds of energetic materials except cubanes. Our ML models are applied to predict ρcrystal and Hf,solid of CHON-based molecules of the 150 million sized PubChem database, and screened out 56 candidates with competitive detonation performance and reasonable chemical structures. With further improvement in future, spatial matrices have the potential of becoming multifunctional ML simulation tools that could provide even better predictions in wider fields of materials science.
author2 School of Materials Science and Engineering
author_facet School of Materials Science and Engineering
Chen, Chao
Liu, Danyang
Deng, Siyan
Zhong, Lixiang
Chan, Serene Hay Yee
Li, Shuzhou
Hng, Huey Hoon
format Article
author Chen, Chao
Liu, Danyang
Deng, Siyan
Zhong, Lixiang
Chan, Serene Hay Yee
Li, Shuzhou
Hng, Huey Hoon
author_sort Chen, Chao
title Accurate machine learning models based on small dataset of energetic materials through spatial matrix featurization methods
title_short Accurate machine learning models based on small dataset of energetic materials through spatial matrix featurization methods
title_full Accurate machine learning models based on small dataset of energetic materials through spatial matrix featurization methods
title_fullStr Accurate machine learning models based on small dataset of energetic materials through spatial matrix featurization methods
title_full_unstemmed Accurate machine learning models based on small dataset of energetic materials through spatial matrix featurization methods
title_sort accurate machine learning models based on small dataset of energetic materials through spatial matrix featurization methods
publishDate 2022
url https://hdl.handle.net/10356/159986
_version_ 1738844788647526400