Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
Optimally doped single-phase compounds are necessary to advance state-of-the-art thermoelectric devices which convert heat into electricity and vice versa, requiring solid-state synthesis of bulk materials. For data-driven approaches to learn these recipes, it requires careful data curation from lar...
Saved in:
Main Authors: | , , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/174885 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-174885 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1748852024-04-19T15:59:52Z Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides Thway, Maung Low, Andre Kai Yuan Khetan, Samyak Dai, Haiwen Recatala-Gomez, Jose Chen, Andy Paul Hippalgaonkar, Kedar School of Materials Science and Engineering Institute of Materials Research and Engineering, A*STAR Engineering GPT-3.5 Text parsing Optimally doped single-phase compounds are necessary to advance state-of-the-art thermoelectric devices which convert heat into electricity and vice versa, requiring solid-state synthesis of bulk materials. For data-driven approaches to learn these recipes, it requires careful data curation from large bodies of text which may not be available for some materials, as well as a refined language processing algorithm which presents a high barrier of entry. We propose applying Large Language Models (LLMs) to parse solid-state synthesis recipes, encapsulating all essential synthesis information intuitively in terms of primary and secondary heating peaks. Using a domain-expert curated dataset for a specific material (Gold Standard), we engineered a prompt set for GPT-3.5 to replicate the same dataset (Silver Standard), doing so successfully with 73% overall accuracy. We then proceed to extract and infer synthesis conditions for other ternary chalcogenides with the same prompt set. From a database of 168 research papers, we successfully parsed 61 papers which we then used to develop a classifier to predict phase purity. Our methodology demonstrates the generalizability of Large Language Models (LLMs) for text parsing, specifically for materials with sparse literature and unbalanced reporting (since usually only positive results are shown). Our work provides a roadmap for future endeavors seeking to amalgamate LLMs with materials science research, heralding a potentially transformative paradigm in the synthesis and characterization of novel materials. Agency for Science, Technology and Research (A*STAR) National Research Foundation (NRF) Published version The authors acknowledge funding from AME Programmatic Funds by the Agency for Science, Technology and Research under Grant (No. A1898b0043). KH also acknowledges funding from the NRF Fellowship (NRF-NRFF13-2021-0011). 2024-04-15T06:04:07Z 2024-04-15T06:04:07Z 2024 Journal Article Thway, M., Low, A. K. Y., Khetan, S., Dai, H., Recatala-Gomez, J., Chen, A. P. & Hippalgaonkar, K. (2024). Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides. Digital Discovery, 3(2), 328-336. https://dx.doi.org/10.1039/d3dd00202k 2635-098X https://hdl.handle.net/10356/174885 10.1039/d3dd00202k 2-s2.0-85182443481 2 3 328 336 en A1898b0043 NRF-NRFF13-2021-0011 Digital Discovery © 2024 The Author(s). Published by the Royal Society of Chemistry. This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering GPT-3.5 Text parsing |
spellingShingle |
Engineering GPT-3.5 Text parsing Thway, Maung Low, Andre Kai Yuan Khetan, Samyak Dai, Haiwen Recatala-Gomez, Jose Chen, Andy Paul Hippalgaonkar, Kedar Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides |
description |
Optimally doped single-phase compounds are necessary to advance state-of-the-art thermoelectric devices which convert heat into electricity and vice versa, requiring solid-state synthesis of bulk materials. For data-driven approaches to learn these recipes, it requires careful data curation from large bodies of text which may not be available for some materials, as well as a refined language processing algorithm which presents a high barrier of entry. We propose applying Large Language Models (LLMs) to parse solid-state synthesis recipes, encapsulating all essential synthesis information intuitively in terms of primary and secondary heating peaks. Using a domain-expert curated dataset for a specific material (Gold Standard), we engineered a prompt set for GPT-3.5 to replicate the same dataset (Silver Standard), doing so successfully with 73% overall accuracy. We then proceed to extract and infer synthesis conditions for other ternary chalcogenides with the same prompt set. From a database of 168 research papers, we successfully parsed 61 papers which we then used to develop a classifier to predict phase purity. Our methodology demonstrates the generalizability of Large Language Models (LLMs) for text parsing, specifically for materials with sparse literature and unbalanced reporting (since usually only positive results are shown). Our work provides a roadmap for future endeavors seeking to amalgamate LLMs with materials science research, heralding a potentially transformative paradigm in the synthesis and characterization of novel materials. |
author2 |
School of Materials Science and Engineering |
author_facet |
School of Materials Science and Engineering Thway, Maung Low, Andre Kai Yuan Khetan, Samyak Dai, Haiwen Recatala-Gomez, Jose Chen, Andy Paul Hippalgaonkar, Kedar |
format |
Article |
author |
Thway, Maung Low, Andre Kai Yuan Khetan, Samyak Dai, Haiwen Recatala-Gomez, Jose Chen, Andy Paul Hippalgaonkar, Kedar |
author_sort |
Thway, Maung |
title |
Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides |
title_short |
Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides |
title_full |
Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides |
title_fullStr |
Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides |
title_full_unstemmed |
Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides |
title_sort |
harnessing gpt-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/174885 |
_version_ |
1800916178604916736 |