COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.
The complexity of words or phrases in a sentence is one way of knowing the literacy level of the reading text. Information about the literacy level of a text can be used to determine the complexity of a corpus. The complexity of a corpus can certainly affect the performance of artificial intellig...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/66605 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | The complexity of words or phrases in a sentence is one way of knowing the literacy
level of the reading text. Information about the literacy level of a text can be used
to determine the complexity of a corpus. The complexity of a corpus can certainly
affect the performance of artificial intelligence to understand the context of a text.
This final project aims to create a model that can predict the complexity value of a
word (subtask 1) or a phrase (subtask 2) that appears in a sentence.
In a previous study in the SemEval 2021 task 1 competition, BERT and RoBERTa
were two contextual pretrained embeddings that managed to get the best
performance on both subtasks. The research in this final project focuses on adding
word and sentence features to the contextual pretrained embedding-based model
and the static embedding-based model to improve performance from the previous
competition.
Based on the experiments conducted, the use of word and sentence features is
proven to improve the performance of the model and the results of stacking. The
results of the best stacking model managed to rank first in subtask 1 with a Pearson
value of 0.7887. In subtask 2, managed to rank second with a Pearson score of
0.8590. Based on further analysis, the characteristics of the built model tend to
predict the complexity of words or phrases that are rarely used higher than words
or phrases that are often used.. |
---|