Machine learning for mixed data

People are always dealing with mixed data, whether in scientific research, industrial production or daily life. With the continuous development of computer technology and the performance of machine learning models, the requirements for processing mixed data are increasing day by day, and one of the...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Zhu, Yixuan
مؤلفون آخرون:	Mao Kezhi
التنسيق:	Thesis-Master by Coursework
اللغة:	English
منشور في:	Nanyang Technological University 2023
الموضوعات:	Engineering::Electrical and electronic engineering
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/165048
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-165048
record_format	dspace
spelling	sg-ntu-dr.10356-1650482023-07-04T15:04:02Z Machine learning for mixed data Zhu, Yixuan Mao Kezhi School of Electrical and Electronic Engineering EKZMao@ntu.edu.sg Engineering::Electrical and electronic engineering People are always dealing with mixed data, whether in scientific research, industrial production or daily life. With the continuous development of computer technology and the performance of machine learning models, the requirements for processing mixed data are increasing day by day, and one of the typical requirements is to classify them. In this dissertation, we first transform the categorical data by embedding algorithms in the mixed datasets and then perform classification experiments on them. In this dissertation, we normalize the numeric data in the mixed-data using relevant mathematical tools. For the categorical data in the mixed-data, we use three embedding methods to transform them into numeric types, namely one-hot encoding, TF-IDF method and embedding based on neural networks. Five machine learning models are used to perform classification experiments on them. These models include Logistic Regression (LR), K-Nearest Neighbors (KNN), Random Forest (RF), Gradient Boosting Decision Tree (GBDT) and XGBoost. We will collect the performance metrics data of each model at the optimal result. Then we compare the classification performance of these three embedding algorithms and the five machine learning models together and discuss them in relation to each other. Ultimately, we can complete a pipeline of models that completely implement the embedding transformation on categorical data and classify mixed-data. Master of Science (Signal Processing) 2023-03-10T08:08:49Z 2023-03-10T08:08:49Z 2023 Thesis-Master by Coursework Zhu, Y. (2023). Machine learning for mixed data. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165048 https://hdl.handle.net/10356/165048 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering
spellingShingle	Engineering::Electrical and electronic engineering Zhu, Yixuan Machine learning for mixed data
description	People are always dealing with mixed data, whether in scientific research, industrial production or daily life. With the continuous development of computer technology and the performance of machine learning models, the requirements for processing mixed data are increasing day by day, and one of the typical requirements is to classify them. In this dissertation, we first transform the categorical data by embedding algorithms in the mixed datasets and then perform classification experiments on them. In this dissertation, we normalize the numeric data in the mixed-data using relevant mathematical tools. For the categorical data in the mixed-data, we use three embedding methods to transform them into numeric types, namely one-hot encoding, TF-IDF method and embedding based on neural networks. Five machine learning models are used to perform classification experiments on them. These models include Logistic Regression (LR), K-Nearest Neighbors (KNN), Random Forest (RF), Gradient Boosting Decision Tree (GBDT) and XGBoost. We will collect the performance metrics data of each model at the optimal result. Then we compare the classification performance of these three embedding algorithms and the five machine learning models together and discuss them in relation to each other. Ultimately, we can complete a pipeline of models that completely implement the embedding transformation on categorical data and classify mixed-data.
author2	Mao Kezhi
author_facet	Mao Kezhi Zhu, Yixuan
format	Thesis-Master by Coursework
author	Zhu, Yixuan
author_sort	Zhu, Yixuan
title	Machine learning for mixed data
title_short	Machine learning for mixed data
title_full	Machine learning for mixed data
title_fullStr	Machine learning for mixed data
title_full_unstemmed	Machine learning for mixed data
title_sort	machine learning for mixed data
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/165048
_version_	1772826706752569344

Machine learning for mixed data

مواد مشابهة