ANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS
Analysis of chemical and biological compounds generally uses the Raman spectrum because of its non-destructive nature. Raman spectral properties that act as kind of a fingerprint can be used to analyze and extract Raman spectral information of a sample. However, the Raman spectrum obtained can ha...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/81518 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:81518 |
---|---|
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Analysis of chemical and biological compounds generally uses the Raman spectrum
because of its non-destructive nature. Raman spectral properties that act as kind of
a fingerprint can be used to analyze and extract Raman spectral information of a
sample. However, the Raman spectrum obtained can have background noise that
makes direct analysis harder. The preprocessing stage in chemometry is become a
very important stage before conducting further analysis.
Raman can be analyzed computationally using statistical chemometric methods
such as principal component analysis and regression or least squares to obtain
discriminatory abilities. The principal component analysis method can extract
information and reduce dimensions from the Raman spectrum. The principal
component analysis method can also reduce noise coming from the principal
component outside of the major principal component. From the major principal
component of the Raman spectrum, regression, least square, classification, or
clustering methods can be used to make further predictions.
One of the problems that often faced by business, or governments related to
processed meat or fat food products is determining the content of meat or fat in it.
One technique that is often used is the polymerase chain reaction technique that
targets DNA from a sample. In this study, the Raman spectrum of chicken, beef,
duck, goat, and cow fat was used to analyze the fat content of a sample.
Computational chemometric methods is used to determine the best model in
predicting the concentration, type, and the halal class of animal fat based on the
Raman spectrum of it.
Chicken, beef, duck, goat, and cow fat was measured 16 times per part of their area
(top, mid, bottom) using Raman spectroscopy with the variations of concentration
of mixed pork with non-pork fat and the variations of pure fat types. The Raman
spectrum dataset was then splitted by a ratio of 85%:15% randomly as training
data and test data. Preprocessing was carried out using the asymmetric least
square baseline correction method, the two-degree Savitzky-Golay smoothing
method, and the modified Z-score spike removal. Principal component analysis is then applied to the preprocessed Raman spectrum to obtain the major principal
component.
Using the principal component of PCA, regression models were created to predict
animal fat concentrations, predict animal fat types (for Raman data spectra with
100% concentrations in one type of fat), and predict the halal class of animal fats.
For the prediction of the animal fat types and animal fat halal class, because it is a
classification problem, the output of the regression model needs to be transformed
or mapped with argmax functions and binary functions (with a threshold of zero).
The regression models that the author used are linear model, decision tree, random
forest, and k-nearest neighbor.
Using the principal component of the PCA, the classification models were also
created to predict the type of animal fat (for the Raman data spectrum with a
concentration of 100% in one type of fat) and predict the halal class of animal fat.
The classifier that the author used are logit regression, decision tree, random
forest, KNN, and SVM.
Based on the Raman spectrum of the five animals, the k-nearest neighbor regression
model with five principal components is the best model to predict the concentration
of each animal fat of a sample with a mean absolute error of 0.031 or 3.1% in the
training data and 0.039 or 3.9% in the test data. However, the k-nearest neighbor
regression model is not suitable for predicting the halal class of a sample, because
the value of pig fat concentration needs to be zero for a sample to be halal while
the average prediction error is 3.9%.
The support vector machine classification model with five principal components is
the best model for predicting the type and the class of animal fat, with an F1 score
of one for predicting the type of an animal fat and an F1 score value of 0.97 for
predicting the halal class of an animal fat. However, an F1 score value of one also
indicates overfitting. These indications need to be further tested by adding
variations in mixed animal fat concentrations, so that the support vector machine
model can be tested for its generalization ability of the data.
|
format |
Theses |
author |
Wahyu Wicaksono, Aria |
spellingShingle |
Wahyu Wicaksono, Aria ANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS |
author_facet |
Wahyu Wicaksono, Aria |
author_sort |
Wahyu Wicaksono, Aria |
title |
ANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS |
title_short |
ANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS |
title_full |
ANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS |
title_fullStr |
ANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS |
title_full_unstemmed |
ANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS |
title_sort |
analysis of animal fat raman spectrum using machine learning methods in determining the concentration, type, and halal class of the mixed components |
url |
https://digilib.itb.ac.id/gdl/view/81518 |
_version_ |
1822281935018262528 |
spelling |
id-itb.:815182024-06-28T13:39:58ZANALYSIS OF ANIMAL FAT RAMAN SPECTRUM USING MACHINE LEARNING METHODS IN DETERMINING THE CONCENTRATION, TYPE, AND HALAL CLASS OF THE MIXED COMPONENTS Wahyu Wicaksono, Aria Indonesia Theses chemometry, Raman, animal fats, PCA, KNN, SVM INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/81518 Analysis of chemical and biological compounds generally uses the Raman spectrum because of its non-destructive nature. Raman spectral properties that act as kind of a fingerprint can be used to analyze and extract Raman spectral information of a sample. However, the Raman spectrum obtained can have background noise that makes direct analysis harder. The preprocessing stage in chemometry is become a very important stage before conducting further analysis. Raman can be analyzed computationally using statistical chemometric methods such as principal component analysis and regression or least squares to obtain discriminatory abilities. The principal component analysis method can extract information and reduce dimensions from the Raman spectrum. The principal component analysis method can also reduce noise coming from the principal component outside of the major principal component. From the major principal component of the Raman spectrum, regression, least square, classification, or clustering methods can be used to make further predictions. One of the problems that often faced by business, or governments related to processed meat or fat food products is determining the content of meat or fat in it. One technique that is often used is the polymerase chain reaction technique that targets DNA from a sample. In this study, the Raman spectrum of chicken, beef, duck, goat, and cow fat was used to analyze the fat content of a sample. Computational chemometric methods is used to determine the best model in predicting the concentration, type, and the halal class of animal fat based on the Raman spectrum of it. Chicken, beef, duck, goat, and cow fat was measured 16 times per part of their area (top, mid, bottom) using Raman spectroscopy with the variations of concentration of mixed pork with non-pork fat and the variations of pure fat types. The Raman spectrum dataset was then splitted by a ratio of 85%:15% randomly as training data and test data. Preprocessing was carried out using the asymmetric least square baseline correction method, the two-degree Savitzky-Golay smoothing method, and the modified Z-score spike removal. Principal component analysis is then applied to the preprocessed Raman spectrum to obtain the major principal component. Using the principal component of PCA, regression models were created to predict animal fat concentrations, predict animal fat types (for Raman data spectra with 100% concentrations in one type of fat), and predict the halal class of animal fats. For the prediction of the animal fat types and animal fat halal class, because it is a classification problem, the output of the regression model needs to be transformed or mapped with argmax functions and binary functions (with a threshold of zero). The regression models that the author used are linear model, decision tree, random forest, and k-nearest neighbor. Using the principal component of the PCA, the classification models were also created to predict the type of animal fat (for the Raman data spectrum with a concentration of 100% in one type of fat) and predict the halal class of animal fat. The classifier that the author used are logit regression, decision tree, random forest, KNN, and SVM. Based on the Raman spectrum of the five animals, the k-nearest neighbor regression model with five principal components is the best model to predict the concentration of each animal fat of a sample with a mean absolute error of 0.031 or 3.1% in the training data and 0.039 or 3.9% in the test data. However, the k-nearest neighbor regression model is not suitable for predicting the halal class of a sample, because the value of pig fat concentration needs to be zero for a sample to be halal while the average prediction error is 3.9%. The support vector machine classification model with five principal components is the best model for predicting the type and the class of animal fat, with an F1 score of one for predicting the type of an animal fat and an F1 score value of 0.97 for predicting the halal class of an animal fat. However, an F1 score value of one also indicates overfitting. These indications need to be further tested by adding variations in mixed animal fat concentrations, so that the support vector machine model can be tested for its generalization ability of the data. text |