Using small database and energy descriptors to predict molecular thermodynamic energies through mediated learning models
Delta machine learning (DML) models have paved a new way to obtaining high fidelity ab initio simulation results of materials by using quantities with lower computational cost as learning materials. However, the low out-of-sample extrapolative ability and the requirement of large training sets have...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/179404 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Delta machine learning (DML) models have paved a new way to obtaining high fidelity ab initio simulation results of materials by using quantities with lower computational cost as learning materials. However, the low out-of-sample extrapolative ability and the requirement of large training sets have limited broader applications of conventional DML models. In this work, we proposed the concept of non-trivial electron energy, an intermediary energy quantity decoded from the electron total energy but exhibiting high Pearson's correlation with various thermodynamic energies, to build up mediated machine learning (MML) models. By hybridizing the intermediary non-trivial electron energy (N) with a bond descriptor (B) and a spatial matrix (S) of organic molecules, our integrated NBS descriptor shows excellent predictive power of thermodynamic energies with errors close to 1 kcal/mol for MML models when trained by a database with 100 entries and tested by a database with 500 entries. Moreover, adding supplemental sets with 10 ∼ 20 entries into the original training set could greatly improve the out-of-sample extendibility of NBS MML models, such as the molecules with obviously larger size, with disparate bond-type, and even with different elemental compositions. The method of mediated learning provides alternative ways to breakthrough limitations of traditional DML models and can be applied conveniently to study formation enthalpy, thermodynamic energy barriers, multi-dimensional Gibbs free energy surface, and other quantum chemical quantities related to materials' internal energy, enthalpy, and free energy under various conditions at tunable training cost, prediction efficiency, and accuracy. |
---|