Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides

Optimally doped single-phase compounds are necessary to advance state-of-the-art thermoelectric devices which convert heat into electricity and vice versa, requiring solid-state synthesis of bulk materials. For data-driven approaches to learn these recipes, it requires careful data curation from lar...

Full description

Saved in:

Bibliographic Details
Main Authors:	Thway, Maung, Low, Andre Kai Yuan, Khetan, Samyak, Dai, Haiwen, Recatala-Gomez, Jose, Chen, Andy Paul, Hippalgaonkar, Kedar
Other Authors:	School of Materials Science and Engineering
Format:	Article
Language:	English
Published:	2024
Subjects:	Engineering GPT-3.5 Text parsing
Online Access:	https://hdl.handle.net/10356/174885
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-174885
record_format	dspace
spelling	sg-ntu-dr.10356-1748852024-04-19T15:59:52Z Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides Thway, Maung Low, Andre Kai Yuan Khetan, Samyak Dai, Haiwen Recatala-Gomez, Jose Chen, Andy Paul Hippalgaonkar, Kedar School of Materials Science and Engineering Institute of Materials Research and Engineering, ASTAR Engineering GPT-3.5 Text parsing Optimally doped single-phase compounds are necessary to advance state-of-the-art thermoelectric devices which convert heat into electricity and vice versa, requiring solid-state synthesis of bulk materials. For data-driven approaches to learn these recipes, it requires careful data curation from large bodies of text which may not be available for some materials, as well as a refined language processing algorithm which presents a high barrier of entry. We propose applying Large Language Models (LLMs) to parse solid-state synthesis recipes, encapsulating all essential synthesis information intuitively in terms of primary and secondary heating peaks. Using a domain-expert curated dataset for a specific material (Gold Standard), we engineered a prompt set for GPT-3.5 to replicate the same dataset (Silver Standard), doing so successfully with 73% overall accuracy. We then proceed to extract and infer synthesis conditions for other ternary chalcogenides with the same prompt set. From a database of 168 research papers, we successfully parsed 61 papers which we then used to develop a classifier to predict phase purity. Our methodology demonstrates the generalizability of Large Language Models (LLMs) for text parsing, specifically for materials with sparse literature and unbalanced reporting (since usually only positive results are shown). Our work provides a roadmap for future endeavors seeking to amalgamate LLMs with materials science research, heralding a potentially transformative paradigm in the synthesis and characterization of novel materials. Agency for Science, Technology and Research (ASTAR) National Research Foundation (NRF) Published version The authors acknowledge funding from AME Programmatic Funds by the Agency for Science, Technology and Research under Grant (No. A1898b0043). KH also acknowledges funding from the NRF Fellowship (NRF-NRFF13-2021-0011). 2024-04-15T06:04:07Z 2024-04-15T06:04:07Z 2024 Journal Article Thway, M., Low, A. K. Y., Khetan, S., Dai, H., Recatala-Gomez, J., Chen, A. P. & Hippalgaonkar, K. (2024). Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides. Digital Discovery, 3(2), 328-336. https://dx.doi.org/10.1039/d3dd00202k 2635-098X https://hdl.handle.net/10356/174885 10.1039/d3dd00202k 2-s2.0-85182443481 2 3 328 336 en A1898b0043 NRF-NRFF13-2021-0011 Digital Discovery © 2024 The Author(s). Published by the Royal Society of Chemistry. This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering GPT-3.5 Text parsing
spellingShingle	Engineering GPT-3.5 Text parsing Thway, Maung Low, Andre Kai Yuan Khetan, Samyak Dai, Haiwen Recatala-Gomez, Jose Chen, Andy Paul Hippalgaonkar, Kedar Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
description	Optimally doped single-phase compounds are necessary to advance state-of-the-art thermoelectric devices which convert heat into electricity and vice versa, requiring solid-state synthesis of bulk materials. For data-driven approaches to learn these recipes, it requires careful data curation from large bodies of text which may not be available for some materials, as well as a refined language processing algorithm which presents a high barrier of entry. We propose applying Large Language Models (LLMs) to parse solid-state synthesis recipes, encapsulating all essential synthesis information intuitively in terms of primary and secondary heating peaks. Using a domain-expert curated dataset for a specific material (Gold Standard), we engineered a prompt set for GPT-3.5 to replicate the same dataset (Silver Standard), doing so successfully with 73% overall accuracy. We then proceed to extract and infer synthesis conditions for other ternary chalcogenides with the same prompt set. From a database of 168 research papers, we successfully parsed 61 papers which we then used to develop a classifier to predict phase purity. Our methodology demonstrates the generalizability of Large Language Models (LLMs) for text parsing, specifically for materials with sparse literature and unbalanced reporting (since usually only positive results are shown). Our work provides a roadmap for future endeavors seeking to amalgamate LLMs with materials science research, heralding a potentially transformative paradigm in the synthesis and characterization of novel materials.
author2	School of Materials Science and Engineering
author_facet	School of Materials Science and Engineering Thway, Maung Low, Andre Kai Yuan Khetan, Samyak Dai, Haiwen Recatala-Gomez, Jose Chen, Andy Paul Hippalgaonkar, Kedar
format	Article
author	Thway, Maung Low, Andre Kai Yuan Khetan, Samyak Dai, Haiwen Recatala-Gomez, Jose Chen, Andy Paul Hippalgaonkar, Kedar
author_sort	Thway, Maung
title	Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title_short	Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title_full	Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title_fullStr	Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title_full_unstemmed	Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title_sort	harnessing gpt-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
publishDate	2024
url	https://hdl.handle.net/10356/174885
_version_	1800916178604916736

Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides

Similar Items