Using small database and energy descriptors to predict molecular thermodynamic energies through mediated learning models

Delta machine learning (DML) models have paved a new way to obtaining high fidelity ab initio simulation results of materials by using quantities with lower computational cost as learning materials. However, the low out-of-sample extrapolative ability and the requirement of large training sets have...

Full description

Saved in:
Bibliographic Details
Main Authors: Chen, Chao, Deng, Siyan, Li, Shuzhou
Other Authors: School of Materials Science and Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/179404
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Delta machine learning (DML) models have paved a new way to obtaining high fidelity ab initio simulation results of materials by using quantities with lower computational cost as learning materials. However, the low out-of-sample extrapolative ability and the requirement of large training sets have limited broader applications of conventional DML models. In this work, we proposed the concept of non-trivial electron energy, an intermediary energy quantity decoded from the electron total energy but exhibiting high Pearson's correlation with various thermodynamic energies, to build up mediated machine learning (MML) models. By hybridizing the intermediary non-trivial electron energy (N) with a bond descriptor (B) and a spatial matrix (S) of organic molecules, our integrated NBS descriptor shows excellent predictive power of thermodynamic energies with errors close to 1 kcal/mol for MML models when trained by a database with 100 entries and tested by a database with 500 entries. Moreover, adding supplemental sets with 10 ∼ 20 entries into the original training set could greatly improve the out-of-sample extendibility of NBS MML models, such as the molecules with obviously larger size, with disparate bond-type, and even with different elemental compositions. The method of mediated learning provides alternative ways to breakthrough limitations of traditional DML models and can be applied conveniently to study formation enthalpy, thermodynamic energy barriers, multi-dimensional Gibbs free energy surface, and other quantum chemical quantities related to materials' internal energy, enthalpy, and free energy under various conditions at tunable training cost, prediction efficiency, and accuracy.