Machine learning-based SERS chemical space for two-way prediction of structures and spectra of untrained molecules

Identifying unknown molecules beyond existing databases remains challenging in surface-enhanced Raman scattering (SERS) spectroscopy. Conventional SERS analysis relies on matching experimental and cataloged spectra, limiting identification to known molecules in databases. With a vast chemical space...

Full description

Saved in:
Bibliographic Details
Main Authors: Chen, Jaslyn Ru Ting, Tan, Emily Xi, Tang, Jingxiang, Leong, Shi Xuan, Hue, Sean Kai Xun, Pun, Chi Seng, Phang, In Yee, Ling, Xing Yi
Other Authors: School of Chemistry, Chemical Engineering and Biotechnology
Format: Article
Language:English
Published: 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182548
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-182548
record_format dspace
spelling sg-ntu-dr.10356-1825482025-02-21T15:32:29Z Machine learning-based SERS chemical space for two-way prediction of structures and spectra of untrained molecules Chen, Jaslyn Ru Ting Tan, Emily Xi Tang, Jingxiang Leong, Shi Xuan Hue, Sean Kai Xun Pun, Chi Seng Phang, In Yee Ling, Xing Yi School of Chemistry, Chemical Engineering and Biotechnology School of Physical and Mathematical Sciences Chemistry SERS Chemical space Untrained molecules Identifying unknown molecules beyond existing databases remains challenging in surface-enhanced Raman scattering (SERS) spectroscopy. Conventional SERS analysis relies on matching experimental and cataloged spectra, limiting identification to known molecules in databases. With a vast chemical space of >10^60 molecules, it is impractical to obtain the spectra of every molecule and rely solely on in silico techniques for spectral predictions. Here, we showcase a ML-based SERS chemical space that leverages key spectra-structure correlations to achieve two-way spectra-to-structure and structure-to-spectra predictions for untrained molecules with > 90% average accuracy. Using a SERS chemical space comprising 38 linear molecules from four classes (alcohols, aldehydes, amines, and carboxylic acids), our experimental and in silico studies reveal underlying spectral features that enable the prediction of untrained molecules represented by two molecular descriptors (functional group and carbon chain length). For forward spectra-to-structure predictions, we devise a two-step “classification and regression” ML framework to sequentially predict the functional group and carbon chain length of untrained molecules with 100% accuracy and < 1 carbon difference, respectively. In addition, using an eXtreme Gradient Boosting (XGBoost) regressor trained on the two molecular descriptors, we attain inverse structure-to-spectra prediction with a high average cosine similarity of 90.4% between the predicted and experimental spectra. Our ML-based SERS chemical space represents a shift for molecular identification from traditional spectral matching to predictive modeling of spectra-structure relationships. These insights could motivate the expansion of SERS chemical spaces and realize demands for present and future SERS technology for accurate unknown identification across diverse fields. Agency for Science, Technology and Research (A*STAR) Nanyang Technological University National Research Foundation (NRF) Submitted/Accepted version This research is supported by Singapore National Research Foundation Investigatorship (NRF-NRFI08-2022-0011), A*STAR AME Individual Research Grant (A20E5c0082), and Competitive Research Programme (NRF-CRP26-2021-0002). J.R.T.C. acknowledges scholarship support from Nanyang Technological University, Singapore. 2025-02-17T00:49:50Z 2025-02-17T00:49:50Z 2025 Journal Article Chen, J. R. T., Tan, E. X., Tang, J., Leong, S. X., Hue, S. K. X., Pun, C. S., Phang, I. Y. & Ling, X. Y. (2025). Machine learning-based SERS chemical space for two-way prediction of structures and spectra of untrained molecules. Journal of the American Chemical Society. https://dx.doi.org/10.1021/jacs.4c15804/ 0002-7863 https://hdl.handle.net/10356/182548 10.1021/jacs.4c15804/ en A20E5c0082 NRF-CRP26-2021-0002 NRF-NRFI08-2022-0011 Journal of the American Chemical Society © 2025 American Chemical Society. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at: https:/doi.org/10.1021/jacs.4c15804/. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Chemistry
SERS
Chemical space
Untrained molecules
spellingShingle Chemistry
SERS
Chemical space
Untrained molecules
Chen, Jaslyn Ru Ting
Tan, Emily Xi
Tang, Jingxiang
Leong, Shi Xuan
Hue, Sean Kai Xun
Pun, Chi Seng
Phang, In Yee
Ling, Xing Yi
Machine learning-based SERS chemical space for two-way prediction of structures and spectra of untrained molecules
description Identifying unknown molecules beyond existing databases remains challenging in surface-enhanced Raman scattering (SERS) spectroscopy. Conventional SERS analysis relies on matching experimental and cataloged spectra, limiting identification to known molecules in databases. With a vast chemical space of >10^60 molecules, it is impractical to obtain the spectra of every molecule and rely solely on in silico techniques for spectral predictions. Here, we showcase a ML-based SERS chemical space that leverages key spectra-structure correlations to achieve two-way spectra-to-structure and structure-to-spectra predictions for untrained molecules with > 90% average accuracy. Using a SERS chemical space comprising 38 linear molecules from four classes (alcohols, aldehydes, amines, and carboxylic acids), our experimental and in silico studies reveal underlying spectral features that enable the prediction of untrained molecules represented by two molecular descriptors (functional group and carbon chain length). For forward spectra-to-structure predictions, we devise a two-step “classification and regression” ML framework to sequentially predict the functional group and carbon chain length of untrained molecules with 100% accuracy and < 1 carbon difference, respectively. In addition, using an eXtreme Gradient Boosting (XGBoost) regressor trained on the two molecular descriptors, we attain inverse structure-to-spectra prediction with a high average cosine similarity of 90.4% between the predicted and experimental spectra. Our ML-based SERS chemical space represents a shift for molecular identification from traditional spectral matching to predictive modeling of spectra-structure relationships. These insights could motivate the expansion of SERS chemical spaces and realize demands for present and future SERS technology for accurate unknown identification across diverse fields.
author2 School of Chemistry, Chemical Engineering and Biotechnology
author_facet School of Chemistry, Chemical Engineering and Biotechnology
Chen, Jaslyn Ru Ting
Tan, Emily Xi
Tang, Jingxiang
Leong, Shi Xuan
Hue, Sean Kai Xun
Pun, Chi Seng
Phang, In Yee
Ling, Xing Yi
format Article
author Chen, Jaslyn Ru Ting
Tan, Emily Xi
Tang, Jingxiang
Leong, Shi Xuan
Hue, Sean Kai Xun
Pun, Chi Seng
Phang, In Yee
Ling, Xing Yi
author_sort Chen, Jaslyn Ru Ting
title Machine learning-based SERS chemical space for two-way prediction of structures and spectra of untrained molecules
title_short Machine learning-based SERS chemical space for two-way prediction of structures and spectra of untrained molecules
title_full Machine learning-based SERS chemical space for two-way prediction of structures and spectra of untrained molecules
title_fullStr Machine learning-based SERS chemical space for two-way prediction of structures and spectra of untrained molecules
title_full_unstemmed Machine learning-based SERS chemical space for two-way prediction of structures and spectra of untrained molecules
title_sort machine learning-based sers chemical space for two-way prediction of structures and spectra of untrained molecules
publishDate 2025
url https://hdl.handle.net/10356/182548
_version_ 1825619698315689984