Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides

Optimally doped single-phase compounds are necessary to advance state-of-the-art thermoelectric devices which convert heat into electricity and vice versa, requiring solid-state synthesis of bulk materials. For data-driven approaches to learn these recipes, it requires careful data curation from lar...

Full description

Saved in:
Bibliographic Details
Main Authors: Thway, Maung, Low, Andre Kai Yuan, Khetan, Samyak, Dai, Haiwen, Recatala-Gomez, Jose, Chen, Andy Paul, Hippalgaonkar, Kedar
Other Authors: School of Materials Science and Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/174885
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-174885
record_format dspace
spelling sg-ntu-dr.10356-1748852024-04-19T15:59:52Z Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides Thway, Maung Low, Andre Kai Yuan Khetan, Samyak Dai, Haiwen Recatala-Gomez, Jose Chen, Andy Paul Hippalgaonkar, Kedar School of Materials Science and Engineering Institute of Materials Research and Engineering, A*STAR Engineering GPT-3.5 Text parsing Optimally doped single-phase compounds are necessary to advance state-of-the-art thermoelectric devices which convert heat into electricity and vice versa, requiring solid-state synthesis of bulk materials. For data-driven approaches to learn these recipes, it requires careful data curation from large bodies of text which may not be available for some materials, as well as a refined language processing algorithm which presents a high barrier of entry. We propose applying Large Language Models (LLMs) to parse solid-state synthesis recipes, encapsulating all essential synthesis information intuitively in terms of primary and secondary heating peaks. Using a domain-expert curated dataset for a specific material (Gold Standard), we engineered a prompt set for GPT-3.5 to replicate the same dataset (Silver Standard), doing so successfully with 73% overall accuracy. We then proceed to extract and infer synthesis conditions for other ternary chalcogenides with the same prompt set. From a database of 168 research papers, we successfully parsed 61 papers which we then used to develop a classifier to predict phase purity. Our methodology demonstrates the generalizability of Large Language Models (LLMs) for text parsing, specifically for materials with sparse literature and unbalanced reporting (since usually only positive results are shown). Our work provides a roadmap for future endeavors seeking to amalgamate LLMs with materials science research, heralding a potentially transformative paradigm in the synthesis and characterization of novel materials. Agency for Science, Technology and Research (A*STAR) National Research Foundation (NRF) Published version The authors acknowledge funding from AME Programmatic Funds by the Agency for Science, Technology and Research under Grant (No. A1898b0043). KH also acknowledges funding from the NRF Fellowship (NRF-NRFF13-2021-0011). 2024-04-15T06:04:07Z 2024-04-15T06:04:07Z 2024 Journal Article Thway, M., Low, A. K. Y., Khetan, S., Dai, H., Recatala-Gomez, J., Chen, A. P. & Hippalgaonkar, K. (2024). Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides. Digital Discovery, 3(2), 328-336. https://dx.doi.org/10.1039/d3dd00202k 2635-098X https://hdl.handle.net/10356/174885 10.1039/d3dd00202k 2-s2.0-85182443481 2 3 328 336 en A1898b0043 NRF-NRFF13-2021-0011 Digital Discovery © 2024 The Author(s). Published by the Royal Society of Chemistry. This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering
GPT-3.5
Text parsing
spellingShingle Engineering
GPT-3.5
Text parsing
Thway, Maung
Low, Andre Kai Yuan
Khetan, Samyak
Dai, Haiwen
Recatala-Gomez, Jose
Chen, Andy Paul
Hippalgaonkar, Kedar
Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
description Optimally doped single-phase compounds are necessary to advance state-of-the-art thermoelectric devices which convert heat into electricity and vice versa, requiring solid-state synthesis of bulk materials. For data-driven approaches to learn these recipes, it requires careful data curation from large bodies of text which may not be available for some materials, as well as a refined language processing algorithm which presents a high barrier of entry. We propose applying Large Language Models (LLMs) to parse solid-state synthesis recipes, encapsulating all essential synthesis information intuitively in terms of primary and secondary heating peaks. Using a domain-expert curated dataset for a specific material (Gold Standard), we engineered a prompt set for GPT-3.5 to replicate the same dataset (Silver Standard), doing so successfully with 73% overall accuracy. We then proceed to extract and infer synthesis conditions for other ternary chalcogenides with the same prompt set. From a database of 168 research papers, we successfully parsed 61 papers which we then used to develop a classifier to predict phase purity. Our methodology demonstrates the generalizability of Large Language Models (LLMs) for text parsing, specifically for materials with sparse literature and unbalanced reporting (since usually only positive results are shown). Our work provides a roadmap for future endeavors seeking to amalgamate LLMs with materials science research, heralding a potentially transformative paradigm in the synthesis and characterization of novel materials.
author2 School of Materials Science and Engineering
author_facet School of Materials Science and Engineering
Thway, Maung
Low, Andre Kai Yuan
Khetan, Samyak
Dai, Haiwen
Recatala-Gomez, Jose
Chen, Andy Paul
Hippalgaonkar, Kedar
format Article
author Thway, Maung
Low, Andre Kai Yuan
Khetan, Samyak
Dai, Haiwen
Recatala-Gomez, Jose
Chen, Andy Paul
Hippalgaonkar, Kedar
author_sort Thway, Maung
title Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title_short Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title_full Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title_fullStr Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title_full_unstemmed Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title_sort harnessing gpt-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
publishDate 2024
url https://hdl.handle.net/10356/174885
_version_ 1800916178604916736