IMBALANCED DATA HANDLING IN MULTI-LABEL ASPECT CATEGORIZATION USING OVERSAMPLING AND ENSEMBLE LEARNING

In sentiment analysis, aspect based sentiment analysis (ABSA) provides detailed information of user sentiment for a product rather than document level and sentence level. Aspect categorization is one of ABSA tasks, which focuses on categorizing which aspects are related to a review text. This task...

Full description

Saved in:

Bibliographic Details
Main Author:	Dicky Alnatara, Wildan
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/50101
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:50101
spelling	id-itb.:501012020-09-22T13:33:43ZIMBALANCED DATA HANDLING IN MULTI-LABEL ASPECT CATEGORIZATION USING OVERSAMPLING AND ENSEMBLE LEARNING Dicky Alnatara, Wildan Indonesia Final Project aspect categorization, imbalanced multilabel data, Cross-Coupling Aggregation, Multilabel Synthetic Minority Over-sampling Technique, Multilabel Synthetic Oversampling approach based on the Local distribution of labels INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/50101 In sentiment analysis, aspect based sentiment analysis (ABSA) provides detailed information of user sentiment for a product rather than document level and sentence level. Aspect categorization is one of ABSA tasks, which focuses on categorizing which aspects are related to a review text. This task working on multilabel data that usually have uneven distribution of aspect occurrences or imbalanced data condition. This paper uses 9284 data from user review text in the hotel domain. We employ 3 techniques to address imbalanced multilabel data, namely crosscoupling aggregation (COCOA), multilabel synthetic minority oversampling technique (MLSMOTE), and multilabel synthetic oversampling approach based on the local distribution of labels (MLSOL). Convolutional Neural Network (CNN)-Classifier Chain (CC)-Extreme Gradient Boosting (XGBoost) is employed as a baseline and base architecture to be applied into those 3 techniques of handling imbalanced multilabel dataset. COCOA and MLSMOTE are the best performers. COCOA achieved F1-Macro of 0.9272, F1 macro MLSMOTE is 0.9276 and F1-Macro baseline is 0.9261. The best performer of COCOA is configured using 4 parameters: binary relevance mode is smote-oversampling, multiclass mode is smote-oversampling, random state=10, and binary relevance ratio=0.5. The best performer of MLSMOTE is configured using 2 parameters: number of neighbors=5, and random state=42. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	In sentiment analysis, aspect based sentiment analysis (ABSA) provides detailed information of user sentiment for a product rather than document level and sentence level. Aspect categorization is one of ABSA tasks, which focuses on categorizing which aspects are related to a review text. This task working on multilabel data that usually have uneven distribution of aspect occurrences or imbalanced data condition. This paper uses 9284 data from user review text in the hotel domain. We employ 3 techniques to address imbalanced multilabel data, namely crosscoupling aggregation (COCOA), multilabel synthetic minority oversampling technique (MLSMOTE), and multilabel synthetic oversampling approach based on the local distribution of labels (MLSOL). Convolutional Neural Network (CNN)-Classifier Chain (CC)-Extreme Gradient Boosting (XGBoost) is employed as a baseline and base architecture to be applied into those 3 techniques of handling imbalanced multilabel dataset. COCOA and MLSMOTE are the best performers. COCOA achieved F1-Macro of 0.9272, F1 macro MLSMOTE is 0.9276 and F1-Macro baseline is 0.9261. The best performer of COCOA is configured using 4 parameters: binary relevance mode is smote-oversampling, multiclass mode is smote-oversampling, random state=10, and binary relevance ratio=0.5. The best performer of MLSMOTE is configured using 2 parameters: number of neighbors=5, and random state=42.
format	Final Project
author	Dicky Alnatara, Wildan
spellingShingle	Dicky Alnatara, Wildan IMBALANCED DATA HANDLING IN MULTI-LABEL ASPECT CATEGORIZATION USING OVERSAMPLING AND ENSEMBLE LEARNING
author_facet	Dicky Alnatara, Wildan
author_sort	Dicky Alnatara, Wildan
title	IMBALANCED DATA HANDLING IN MULTI-LABEL ASPECT CATEGORIZATION USING OVERSAMPLING AND ENSEMBLE LEARNING
title_short	IMBALANCED DATA HANDLING IN MULTI-LABEL ASPECT CATEGORIZATION USING OVERSAMPLING AND ENSEMBLE LEARNING
title_full	IMBALANCED DATA HANDLING IN MULTI-LABEL ASPECT CATEGORIZATION USING OVERSAMPLING AND ENSEMBLE LEARNING
title_fullStr	IMBALANCED DATA HANDLING IN MULTI-LABEL ASPECT CATEGORIZATION USING OVERSAMPLING AND ENSEMBLE LEARNING
title_full_unstemmed	IMBALANCED DATA HANDLING IN MULTI-LABEL ASPECT CATEGORIZATION USING OVERSAMPLING AND ENSEMBLE LEARNING
title_sort	imbalanced data handling in multi-label aspect categorization using oversampling and ensemble learning
url	https://digilib.itb.ac.id/gdl/view/50101
_version_	1822000560918757376

IMBALANCED DATA HANDLING IN MULTI-LABEL ASPECT CATEGORIZATION USING OVERSAMPLING AND ENSEMBLE LEARNING

Similar Items