IDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT

The coherence of each word or phrase in instructional text is crucial because incorrect word choice can lead to different outcomes. This research aimed to develop models to identify word or phrase coherence in instructional texts for classification and regression tasks. This topic is similar to t...

Full description

Saved in:

Bibliographic Details
Main Author:	Sumerta Yoga, Gede
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/82499
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:82499
spelling	id-itb.:824992024-07-08T14:41:59ZIDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT Sumerta Yoga, Gede Indonesia Final Project Word Coherence, DeBERTaV3, ensemble, boosting, AdaBoost.RT, BoostingBERT INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/82499 The coherence of each word or phrase in instructional text is crucial because incorrect word choice can lead to different outcomes. This research aimed to develop models to identify word or phrase coherence in instructional texts for classification and regression tasks. This topic is similar to the one in SemEval 2022 task 7, and we will use the same tasks: classification and regression. Word coherence or plausibility phrases is tested by evaluating how well a word or phrase fits when substituted into the text based on the surrounding context. This method is similar to BERT training techniques, masked language model (MLM). To increased the perfomance of the model, ensemble learning will be used specifically boosting with DeBERTaV3, an advanced variant of BERT, as the weak learner. Model’s perfomance will be compared with the best models in SemEval 2022 task 7 and advantages and disadvantages of the model will be analyzed. The training phase of boosting method will be run iteratively and sequentially, focusing on incorrect predictions from previous iteration. In this final project, two model will be developed with two AdaBoost algorithm modifications. BoostingBERT technique used to develop model for classification task while AdaBoost.RT technique used to develop model for regression task. The implementation of those technique used DeBERTaV3 as the weak learner. Additionally, there are also data preparation and imbalance data handling for the training dataset used by SemEval 2022. The developed model achieved fourth place in both the regression and classification in the SemEval 2022 task 7. In classification task, the model achieved an accuracy of 64.24%, demonstrating its ability to classify the coherence of words or phrases with a relatively high level of accuracy. Meanwhile, in the regression task, the model achieved Spearman’s rank correlation of 0.765. However, the final model size was quite large, reaching 9.8 GB for each task. Additionaly, the model struggle to predicting the neutral label in classification task and low score data in the regression task. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	The coherence of each word or phrase in instructional text is crucial because incorrect word choice can lead to different outcomes. This research aimed to develop models to identify word or phrase coherence in instructional texts for classification and regression tasks. This topic is similar to the one in SemEval 2022 task 7, and we will use the same tasks: classification and regression. Word coherence or plausibility phrases is tested by evaluating how well a word or phrase fits when substituted into the text based on the surrounding context. This method is similar to BERT training techniques, masked language model (MLM). To increased the perfomance of the model, ensemble learning will be used specifically boosting with DeBERTaV3, an advanced variant of BERT, as the weak learner. Model’s perfomance will be compared with the best models in SemEval 2022 task 7 and advantages and disadvantages of the model will be analyzed. The training phase of boosting method will be run iteratively and sequentially, focusing on incorrect predictions from previous iteration. In this final project, two model will be developed with two AdaBoost algorithm modifications. BoostingBERT technique used to develop model for classification task while AdaBoost.RT technique used to develop model for regression task. The implementation of those technique used DeBERTaV3 as the weak learner. Additionally, there are also data preparation and imbalance data handling for the training dataset used by SemEval 2022. The developed model achieved fourth place in both the regression and classification in the SemEval 2022 task 7. In classification task, the model achieved an accuracy of 64.24%, demonstrating its ability to classify the coherence of words or phrases with a relatively high level of accuracy. Meanwhile, in the regression task, the model achieved Spearman’s rank correlation of 0.765. However, the final model size was quite large, reaching 9.8 GB for each task. Additionaly, the model struggle to predicting the neutral label in classification task and low score data in the regression task.
format	Final Project
author	Sumerta Yoga, Gede
spellingShingle	Sumerta Yoga, Gede IDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT
author_facet	Sumerta Yoga, Gede
author_sort	Sumerta Yoga, Gede
title	IDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT
title_short	IDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT
title_full	IDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT
title_fullStr	IDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT
title_full_unstemmed	IDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT
title_sort	identifying plausibility phrases in instructional texts using boostingbert and adaboost.rt
url	https://digilib.itb.ac.id/gdl/view/82499
_version_	1823656301238943744

IDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT

Similar Items