LGB-stack: stacked generalization with LightGBM for highly accurate predictions of polymer bandgap
Recently, the Ramprasad group reported a quantitative structure–property relationship (QSPR) model for predicting the Egap values of 4209 polymers, which yielded a test set R2 score of 0.90 and a test set root-mean-square error (RMSE) score of 0.44 at a train/test split ratio of 80/20. In this paper...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/161482 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-161482 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1614822024-01-08T04:01:01Z LGB-stack: stacked generalization with LightGBM for highly accurate predictions of polymer bandgap Goh, Kai Leong Goto, Atsushi Lu, Yunpeng School of Physical and Mathematical Sciences Science::Chemistry Polymers Algorithms Recently, the Ramprasad group reported a quantitative structure–property relationship (QSPR) model for predicting the Egap values of 4209 polymers, which yielded a test set R2 score of 0.90 and a test set root-mean-square error (RMSE) score of 0.44 at a train/test split ratio of 80/20. In this paper, we present a new QSPR model named LGB-Stack, which performs a two-level stacked generalization using the light gradient boosting machine. At level 1, multiple weak models are trained, and at level 2, they are combined into a strong final model. Four molecular fingerprints were generated from the simplified molecular input line entry system notations of the polymers. They were trimmed using recursive feature elimination and used as the initial input features for training the weak models. The output predictions of the weak models were used as the new input features for training the final model, which completes the LGB-Stack model training process. Our results show that the best test set R2 and the RMSE scores of LGB-Stack at the train/test split ratio of 80/20 were 0.92 and 0.41, respectively. The accuracy scores further improved to 0.94 and 0.34, respectively, when the train/test split ratio of 95/5 was used. Ministry of Education (MOE) Published version This research was supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 1 RG83/20. 2022-09-07T05:21:51Z 2022-09-07T05:21:51Z 2022 Journal Article Goh, K. L., Goto, A. & Lu, Y. (2022). LGB-stack: stacked generalization with LightGBM for highly accurate predictions of polymer bandgap. ACS Omega, 7(34), 29787-29793. https://dx.doi.org/10.1021/acsomega.2c02554 2470-1343 https://hdl.handle.net/10356/161482 10.1021/acsomega.2c02554 34 7 29787 29793 en CHEM/21/095 RG83/20 ACS Omega © 2022 The Authors. Published by American Chemical Society. This is an open-access article distributed under the terms of the Creative Commons Attribution License. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Science::Chemistry Polymers Algorithms |
spellingShingle |
Science::Chemistry Polymers Algorithms Goh, Kai Leong Goto, Atsushi Lu, Yunpeng LGB-stack: stacked generalization with LightGBM for highly accurate predictions of polymer bandgap |
description |
Recently, the Ramprasad group reported a quantitative structure–property relationship (QSPR) model for predicting the Egap values of 4209 polymers, which yielded a test set R2 score of 0.90 and a test set root-mean-square error (RMSE) score of 0.44 at a train/test split ratio of 80/20. In this paper, we present a new QSPR model named LGB-Stack, which performs a two-level stacked generalization using the light gradient boosting machine. At level 1, multiple weak models are trained, and at level 2, they are combined into a strong final model. Four molecular fingerprints were generated from the simplified molecular input line entry system notations of the polymers. They were trimmed using recursive feature elimination and used as the initial input features for training the weak models. The output predictions of the weak models were used as the new input features for training the final model, which completes the LGB-Stack model training process. Our results show that the best test set R2 and the RMSE scores of LGB-Stack at the train/test split ratio of 80/20 were 0.92 and 0.41, respectively. The accuracy scores further improved to 0.94 and 0.34, respectively, when the train/test split ratio of 95/5 was used. |
author2 |
School of Physical and Mathematical Sciences |
author_facet |
School of Physical and Mathematical Sciences Goh, Kai Leong Goto, Atsushi Lu, Yunpeng |
format |
Article |
author |
Goh, Kai Leong Goto, Atsushi Lu, Yunpeng |
author_sort |
Goh, Kai Leong |
title |
LGB-stack: stacked generalization with LightGBM for highly accurate predictions of polymer bandgap |
title_short |
LGB-stack: stacked generalization with LightGBM for highly accurate predictions of polymer bandgap |
title_full |
LGB-stack: stacked generalization with LightGBM for highly accurate predictions of polymer bandgap |
title_fullStr |
LGB-stack: stacked generalization with LightGBM for highly accurate predictions of polymer bandgap |
title_full_unstemmed |
LGB-stack: stacked generalization with LightGBM for highly accurate predictions of polymer bandgap |
title_sort |
lgb-stack: stacked generalization with lightgbm for highly accurate predictions of polymer bandgap |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/161482 |
_version_ |
1787590717517856768 |