AUTOMATIC REMOVAL OF SPEECH ARTIFACT IN ELECTROENCEPHALOGRAM DATA USING MACHINE LEARNING

Electroencephalogram (EEG) is an equipment used to record the electrical activities originating from the brain. Unfortunately, the EEG data are often contaminated by artifacts, which are defined as electrical activities that are not generated by the brain, so the data cannot be processed further. Th...

Full description

Saved in:
Bibliographic Details
Main Author: Lovenia, Holy
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/39286
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:39286
spelling id-itb.:392862019-06-25T11:36:44ZAUTOMATIC REMOVAL OF SPEECH ARTIFACT IN ELECTROENCEPHALOGRAM DATA USING MACHINE LEARNING Lovenia, Holy Indonesia Final Project automatic speech artifact removal, EEG, machine learning INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/39286 Electroencephalogram (EEG) is an equipment used to record the electrical activities originating from the brain. Unfortunately, the EEG data are often contaminated by artifacts, which are defined as electrical activities that are not generated by the brain, so the data cannot be processed further. This noise certainly leads EEG data processing to many problems and limitations, especially for speech artifacts that often emerge in studies related to communication. In addition, no previous research has studied the characteristics of speech artifacts, causing severe difficulties in detecting them. Therefore, the present study aims to: 1) construct a speech artifact removal system, 2) find the best classification and clustering machine learning models for detection, 3) the search for prospects for using deep neural networks, and 4) find the important features. Before the machine learning experiment began, EEG-specific preprocessing and decomposition steps were applied to the signals. Afterwards, each of the independent components was labelled according to the correlation with lip EMG and the features were extracted. The machine learning model building experiment consisted of several scenarios that focused on the construction of a baseline model, imbalanced data handling, and feature selection/extraction techniques. Random Forest (f1-score on testing: 0.97) with upsampling and best parameter configuration came out as the best classification model, while Agglomerative (purity on testing: 0.63) with SMOTE, Select K Best as feature selection, and best parameter configuration had the best performance amongst the clustering models. The important features are determined according to the feature importance from the best classification model. Feedforward Neural Networks (f1-score test: 0.74) showed that speech artifact detection with deep neural networks had a promising prospect in the future. The speech artifact removal system was built using the best models established by the machine learning experiment. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Electroencephalogram (EEG) is an equipment used to record the electrical activities originating from the brain. Unfortunately, the EEG data are often contaminated by artifacts, which are defined as electrical activities that are not generated by the brain, so the data cannot be processed further. This noise certainly leads EEG data processing to many problems and limitations, especially for speech artifacts that often emerge in studies related to communication. In addition, no previous research has studied the characteristics of speech artifacts, causing severe difficulties in detecting them. Therefore, the present study aims to: 1) construct a speech artifact removal system, 2) find the best classification and clustering machine learning models for detection, 3) the search for prospects for using deep neural networks, and 4) find the important features. Before the machine learning experiment began, EEG-specific preprocessing and decomposition steps were applied to the signals. Afterwards, each of the independent components was labelled according to the correlation with lip EMG and the features were extracted. The machine learning model building experiment consisted of several scenarios that focused on the construction of a baseline model, imbalanced data handling, and feature selection/extraction techniques. Random Forest (f1-score on testing: 0.97) with upsampling and best parameter configuration came out as the best classification model, while Agglomerative (purity on testing: 0.63) with SMOTE, Select K Best as feature selection, and best parameter configuration had the best performance amongst the clustering models. The important features are determined according to the feature importance from the best classification model. Feedforward Neural Networks (f1-score test: 0.74) showed that speech artifact detection with deep neural networks had a promising prospect in the future. The speech artifact removal system was built using the best models established by the machine learning experiment.
format Final Project
author Lovenia, Holy
spellingShingle Lovenia, Holy
AUTOMATIC REMOVAL OF SPEECH ARTIFACT IN ELECTROENCEPHALOGRAM DATA USING MACHINE LEARNING
author_facet Lovenia, Holy
author_sort Lovenia, Holy
title AUTOMATIC REMOVAL OF SPEECH ARTIFACT IN ELECTROENCEPHALOGRAM DATA USING MACHINE LEARNING
title_short AUTOMATIC REMOVAL OF SPEECH ARTIFACT IN ELECTROENCEPHALOGRAM DATA USING MACHINE LEARNING
title_full AUTOMATIC REMOVAL OF SPEECH ARTIFACT IN ELECTROENCEPHALOGRAM DATA USING MACHINE LEARNING
title_fullStr AUTOMATIC REMOVAL OF SPEECH ARTIFACT IN ELECTROENCEPHALOGRAM DATA USING MACHINE LEARNING
title_full_unstemmed AUTOMATIC REMOVAL OF SPEECH ARTIFACT IN ELECTROENCEPHALOGRAM DATA USING MACHINE LEARNING
title_sort automatic removal of speech artifact in electroencephalogram data using machine learning
url https://digilib.itb.ac.id/gdl/view/39286
_version_ 1822269219956326400