ANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS

Analysis of chemical and biological compounds generally uses the Raman spectrum because of its non-destructive nature. Raman spectral properties that act as kind of a fingerprint can be used to analyze and extract Raman spectral information of a sample. However, the Raman spectrum obtained can ha...

Full description

Saved in:
Bibliographic Details
Main Author: Wahyu Wicaksono, Aria
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/81518
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:81518
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Analysis of chemical and biological compounds generally uses the Raman spectrum because of its non-destructive nature. Raman spectral properties that act as kind of a fingerprint can be used to analyze and extract Raman spectral information of a sample. However, the Raman spectrum obtained can have background noise that makes direct analysis harder. The preprocessing stage in chemometry is become a very important stage before conducting further analysis. Raman can be analyzed computationally using statistical chemometric methods such as principal component analysis and regression or least squares to obtain discriminatory abilities. The principal component analysis method can extract information and reduce dimensions from the Raman spectrum. The principal component analysis method can also reduce noise coming from the principal component outside of the major principal component. From the major principal component of the Raman spectrum, regression, least square, classification, or clustering methods can be used to make further predictions. One of the problems that often faced by business, or governments related to processed meat or fat food products is determining the content of meat or fat in it. One technique that is often used is the polymerase chain reaction technique that targets DNA from a sample. In this study, the Raman spectrum of chicken, beef, duck, goat, and cow fat was used to analyze the fat content of a sample. Computational chemometric methods is used to determine the best model in predicting the concentration, type, and the halal class of animal fat based on the Raman spectrum of it. Chicken, beef, duck, goat, and cow fat was measured 16 times per part of their area (top, mid, bottom) using Raman spectroscopy with the variations of concentration of mixed pork with non-pork fat and the variations of pure fat types. The Raman spectrum dataset was then splitted by a ratio of 85%:15% randomly as training data and test data. Preprocessing was carried out using the asymmetric least square baseline correction method, the two-degree Savitzky-Golay smoothing method, and the modified Z-score spike removal. Principal component analysis is then applied to the preprocessed Raman spectrum to obtain the major principal component. Using the principal component of PCA, regression models were created to predict animal fat concentrations, predict animal fat types (for Raman data spectra with 100% concentrations in one type of fat), and predict the halal class of animal fats. For the prediction of the animal fat types and animal fat halal class, because it is a classification problem, the output of the regression model needs to be transformed or mapped with argmax functions and binary functions (with a threshold of zero). The regression models that the author used are linear model, decision tree, random forest, and k-nearest neighbor. Using the principal component of the PCA, the classification models were also created to predict the type of animal fat (for the Raman data spectrum with a concentration of 100% in one type of fat) and predict the halal class of animal fat. The classifier that the author used are logit regression, decision tree, random forest, KNN, and SVM. Based on the Raman spectrum of the five animals, the k-nearest neighbor regression model with five principal components is the best model to predict the concentration of each animal fat of a sample with a mean absolute error of 0.031 or 3.1% in the training data and 0.039 or 3.9% in the test data. However, the k-nearest neighbor regression model is not suitable for predicting the halal class of a sample, because the value of pig fat concentration needs to be zero for a sample to be halal while the average prediction error is 3.9%. The support vector machine classification model with five principal components is the best model for predicting the type and the class of animal fat, with an F1 score of one for predicting the type of an animal fat and an F1 score value of 0.97 for predicting the halal class of an animal fat. However, an F1 score value of one also indicates overfitting. These indications need to be further tested by adding variations in mixed animal fat concentrations, so that the support vector machine model can be tested for its generalization ability of the data.
format Theses
author Wahyu Wicaksono, Aria
spellingShingle Wahyu Wicaksono, Aria
ANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS
author_facet Wahyu Wicaksono, Aria
author_sort Wahyu Wicaksono, Aria
title ANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS
title_short ANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS
title_full ANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS
title_fullStr ANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS
title_full_unstemmed ANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS
title_sort analysis of animal fat raman spectrum using machine learning methods in determining the concentration, type, and halal class of the mixed components
url https://digilib.itb.ac.id/gdl/view/81518
_version_ 1822281935018262528
spelling id-itb.:815182024-06-28T13:39:58ZANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS Wahyu Wicaksono, Aria Indonesia Theses chemometry, Raman, animal fats, PCA, KNN, SVM INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/81518 Analysis of chemical and biological compounds generally uses the Raman spectrum because of its non-destructive nature. Raman spectral properties that act as kind of a fingerprint can be used to analyze and extract Raman spectral information of a sample. However, the Raman spectrum obtained can have background noise that makes direct analysis harder. The preprocessing stage in chemometry is become a very important stage before conducting further analysis. Raman can be analyzed computationally using statistical chemometric methods such as principal component analysis and regression or least squares to obtain discriminatory abilities. The principal component analysis method can extract information and reduce dimensions from the Raman spectrum. The principal component analysis method can also reduce noise coming from the principal component outside of the major principal component. From the major principal component of the Raman spectrum, regression, least square, classification, or clustering methods can be used to make further predictions. One of the problems that often faced by business, or governments related to processed meat or fat food products is determining the content of meat or fat in it. One technique that is often used is the polymerase chain reaction technique that targets DNA from a sample. In this study, the Raman spectrum of chicken, beef, duck, goat, and cow fat was used to analyze the fat content of a sample. Computational chemometric methods is used to determine the best model in predicting the concentration, type, and the halal class of animal fat based on the Raman spectrum of it. Chicken, beef, duck, goat, and cow fat was measured 16 times per part of their area (top, mid, bottom) using Raman spectroscopy with the variations of concentration of mixed pork with non-pork fat and the variations of pure fat types. The Raman spectrum dataset was then splitted by a ratio of 85%:15% randomly as training data and test data. Preprocessing was carried out using the asymmetric least square baseline correction method, the two-degree Savitzky-Golay smoothing method, and the modified Z-score spike removal. Principal component analysis is then applied to the preprocessed Raman spectrum to obtain the major principal component. Using the principal component of PCA, regression models were created to predict animal fat concentrations, predict animal fat types (for Raman data spectra with 100% concentrations in one type of fat), and predict the halal class of animal fats. For the prediction of the animal fat types and animal fat halal class, because it is a classification problem, the output of the regression model needs to be transformed or mapped with argmax functions and binary functions (with a threshold of zero). The regression models that the author used are linear model, decision tree, random forest, and k-nearest neighbor. Using the principal component of the PCA, the classification models were also created to predict the type of animal fat (for the Raman data spectrum with a concentration of 100% in one type of fat) and predict the halal class of animal fat. The classifier that the author used are logit regression, decision tree, random forest, KNN, and SVM. Based on the Raman spectrum of the five animals, the k-nearest neighbor regression model with five principal components is the best model to predict the concentration of each animal fat of a sample with a mean absolute error of 0.031 or 3.1% in the training data and 0.039 or 3.9% in the test data. However, the k-nearest neighbor regression model is not suitable for predicting the halal class of a sample, because the value of pig fat concentration needs to be zero for a sample to be halal while the average prediction error is 3.9%. The support vector machine classification model with five principal components is the best model for predicting the type and the class of animal fat, with an F1 score of one for predicting the type of an animal fat and an F1 score value of 0.97 for predicting the halal class of an animal fat. However, an F1 score value of one also indicates overfitting. These indications need to be further tested by adding variations in mixed animal fat concentrations, so that the support vector machine model can be tested for its generalization ability of the data. text