Utilizing chemical domain knowledge and machine learning for nanoparticle and biochemical spectroscopic analysis

Rapid and accurate chemical analysis is desirable in many scientific and technological fields but remains challenging. This thesis demonstrates the integration of domain knowledge-driven feature engineering and machine learning (ML) with UV-vis and SERS spectroscopic analyses for high-throughput...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Emily Xi
Other Authors: Ling Xing Yi
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182230
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Rapid and accurate chemical analysis is desirable in many scientific and technological fields but remains challenging. This thesis demonstrates the integration of domain knowledge-driven feature engineering and machine learning (ML) with UV-vis and SERS spectroscopic analyses for high-throughput characterization of both nanomaterials and biochemicals. Traditional electron microscopy for SERS-active nanoparticles is slow and tedious while existing SERS methods often rely on static spectral matching that only identifies known molecules and struggles with unknown chemical mixtures. To address these challenges, this work introduces a twin-pillar strategy: using ML and UV-vis spectroscopy for rapid nanocharacterization and applying ML-driven SERS to detect unknown biochemicals. Chapter 2 introduces a ML-based UV-vis method for characterizing gold nanospheres, achieving high accuracy over the widest size range through the use of basis spline regression. Chapter 3 extends this approach to more complex nanoshapes, such as nanocubes in mixtures, using feature engineering for unprecedented size, purity, and shape predictions from multiplex extinction spectra with low error rates. Chapter 4 presents a hierarchical ML framework for SERS that identifies and quantifies unknown cerebrosides at various concentrations. This signifies a paradigm shift from passive spectral analysis to active identification of unknown molecules. Chapter 5 develops a transfer learning framework achieving precise SERS identification and quantification of unknown carnitine mixtures. Finally, we discuss the prospects of ML-driven spectroscopic analysis to harness high-dimensional multimodal data and identify new and “unseen” analytes amid various interferences.