COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.

The complexity of words or phrases in a sentence is one way of knowing the literacy level of the reading text. Information about the literacy level of a text can be used to determine the complexity of a corpus. The complexity of a corpus can certainly affect the performance of artificial intellig...

Full description

Saved in:
Bibliographic Details
Main Author: Stanley Yoga Setiawan, Stefanus
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/66605
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:66605
spelling id-itb.:666052022-06-29T08:56:34ZCOMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING. Stanley Yoga Setiawan, Stefanus Indonesia Final Project Lexical Complexity Prediction using Deep Learning with Sentence and Word Features INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/66605 The complexity of words or phrases in a sentence is one way of knowing the literacy level of the reading text. Information about the literacy level of a text can be used to determine the complexity of a corpus. The complexity of a corpus can certainly affect the performance of artificial intelligence to understand the context of a text. This final project aims to create a model that can predict the complexity value of a word (subtask 1) or a phrase (subtask 2) that appears in a sentence. In a previous study in the SemEval 2021 task 1 competition, BERT and RoBERTa were two contextual pretrained embeddings that managed to get the best performance on both subtasks. The research in this final project focuses on adding word and sentence features to the contextual pretrained embedding-based model and the static embedding-based model to improve performance from the previous competition. Based on the experiments conducted, the use of word and sentence features is proven to improve the performance of the model and the results of stacking. The results of the best stacking model managed to rank first in subtask 1 with a Pearson value of 0.7887. In subtask 2, managed to rank second with a Pearson score of 0.8590. Based on further analysis, the characteristics of the built model tend to predict the complexity of words or phrases that are rarely used higher than words or phrases that are often used.. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description The complexity of words or phrases in a sentence is one way of knowing the literacy level of the reading text. Information about the literacy level of a text can be used to determine the complexity of a corpus. The complexity of a corpus can certainly affect the performance of artificial intelligence to understand the context of a text. This final project aims to create a model that can predict the complexity value of a word (subtask 1) or a phrase (subtask 2) that appears in a sentence. In a previous study in the SemEval 2021 task 1 competition, BERT and RoBERTa were two contextual pretrained embeddings that managed to get the best performance on both subtasks. The research in this final project focuses on adding word and sentence features to the contextual pretrained embedding-based model and the static embedding-based model to improve performance from the previous competition. Based on the experiments conducted, the use of word and sentence features is proven to improve the performance of the model and the results of stacking. The results of the best stacking model managed to rank first in subtask 1 with a Pearson value of 0.7887. In subtask 2, managed to rank second with a Pearson score of 0.8590. Based on further analysis, the characteristics of the built model tend to predict the complexity of words or phrases that are rarely used higher than words or phrases that are often used..
format Final Project
author Stanley Yoga Setiawan, Stefanus
spellingShingle Stanley Yoga Setiawan, Stefanus
COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.
author_facet Stanley Yoga Setiawan, Stefanus
author_sort Stanley Yoga Setiawan, Stefanus
title COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.
title_short COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.
title_full COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.
title_fullStr COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.
title_full_unstemmed COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.
title_sort complexity, word features, sentence features, bert, roberta, xlnet, stacking.
url https://digilib.itb.ac.id/gdl/view/66605
_version_ 1822277671338377216