IDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT
The coherence of each word or phrase in instructional text is crucial because incorrect word choice can lead to different outcomes. This research aimed to develop models to identify word or phrase coherence in instructional texts for classification and regression tasks. This topic is similar to t...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/82499 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:82499 |
---|---|
spelling |
id-itb.:824992024-07-08T14:41:59ZIDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT Sumerta Yoga, Gede Indonesia Final Project Word Coherence, DeBERTaV3, ensemble, boosting, AdaBoost.RT, BoostingBERT INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/82499 The coherence of each word or phrase in instructional text is crucial because incorrect word choice can lead to different outcomes. This research aimed to develop models to identify word or phrase coherence in instructional texts for classification and regression tasks. This topic is similar to the one in SemEval 2022 task 7, and we will use the same tasks: classification and regression. Word coherence or plausibility phrases is tested by evaluating how well a word or phrase fits when substituted into the text based on the surrounding context. This method is similar to BERT training techniques, masked language model (MLM). To increased the perfomance of the model, ensemble learning will be used specifically boosting with DeBERTaV3, an advanced variant of BERT, as the weak learner. Model’s perfomance will be compared with the best models in SemEval 2022 task 7 and advantages and disadvantages of the model will be analyzed. The training phase of boosting method will be run iteratively and sequentially, focusing on incorrect predictions from previous iteration. In this final project, two model will be developed with two AdaBoost algorithm modifications. BoostingBERT technique used to develop model for classification task while AdaBoost.RT technique used to develop model for regression task. The implementation of those technique used DeBERTaV3 as the weak learner. Additionally, there are also data preparation and imbalance data handling for the training dataset used by SemEval 2022. The developed model achieved fourth place in both the regression and classification in the SemEval 2022 task 7. In classification task, the model achieved an accuracy of 64.24%, demonstrating its ability to classify the coherence of words or phrases with a relatively high level of accuracy. Meanwhile, in the regression task, the model achieved Spearman’s rank correlation of 0.765. However, the final model size was quite large, reaching 9.8 GB for each task. Additionaly, the model struggle to predicting the neutral label in classification task and low score data in the regression task. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
The coherence of each word or phrase in instructional text is crucial because
incorrect word choice can lead to different outcomes. This research aimed to
develop models to identify word or phrase coherence in instructional texts for
classification and regression tasks. This topic is similar to the one in SemEval 2022
task 7, and we will use the same tasks: classification and regression. Word
coherence or plausibility phrases is tested by evaluating how well a word or phrase
fits when substituted into the text based on the surrounding context. This method is
similar to BERT training techniques, masked language model (MLM). To increased
the perfomance of the model, ensemble learning will be used specifically boosting
with DeBERTaV3, an advanced variant of BERT, as the weak learner. Model’s
perfomance will be compared with the best models in SemEval 2022 task 7 and
advantages and disadvantages of the model will be analyzed.
The training phase of boosting method will be run iteratively and sequentially,
focusing on incorrect predictions from previous iteration. In this final project, two
model will be developed with two AdaBoost algorithm modifications.
BoostingBERT technique used to develop model for classification task while
AdaBoost.RT technique used to develop model for regression task. The
implementation of those technique used DeBERTaV3 as the weak learner.
Additionally, there are also data preparation and imbalance data handling for the
training dataset used by SemEval 2022.
The developed model achieved fourth place in both the regression and classification
in the SemEval 2022 task 7. In classification task, the model achieved an accuracy
of 64.24%, demonstrating its ability to classify the coherence of words or phrases
with a relatively high level of accuracy. Meanwhile, in the regression task, the
model achieved Spearman’s rank correlation of 0.765. However, the final model
size was quite large, reaching 9.8 GB for each task. Additionaly, the model struggle
to predicting the neutral label in classification task and low score data in the
regression task. |
format |
Final Project |
author |
Sumerta Yoga, Gede |
spellingShingle |
Sumerta Yoga, Gede IDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT |
author_facet |
Sumerta Yoga, Gede |
author_sort |
Sumerta Yoga, Gede |
title |
IDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT |
title_short |
IDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT |
title_full |
IDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT |
title_fullStr |
IDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT |
title_full_unstemmed |
IDENTIFYING PLAUSIBILITY PHRASES IN INSTRUCTIONAL TEXTS USING BOOSTINGBERT AND ADABOOST.RT |
title_sort |
identifying plausibility phrases in instructional texts using boostingbert and adaboost.rt |
url |
https://digilib.itb.ac.id/gdl/view/82499 |
_version_ |
1823656301238943744 |