COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.

The complexity of words or phrases in a sentence is one way of knowing the literacy level of the reading text. Information about the literacy level of a text can be used to determine the complexity of a corpus. The complexity of a corpus can certainly affect the performance of artificial intellig...

Full description

Saved in:

Bibliographic Details
Main Author:	Stanley Yoga Setiawan, Stefanus
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/66605
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:66605
spelling	id-itb.:666052022-06-29T08:56:34ZCOMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING. Stanley Yoga Setiawan, Stefanus Indonesia Final Project Lexical Complexity Prediction using Deep Learning with Sentence and Word Features INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/66605 The complexity of words or phrases in a sentence is one way of knowing the literacy level of the reading text. Information about the literacy level of a text can be used to determine the complexity of a corpus. The complexity of a corpus can certainly affect the performance of artificial intelligence to understand the context of a text. This final project aims to create a model that can predict the complexity value of a word (subtask 1) or a phrase (subtask 2) that appears in a sentence. In a previous study in the SemEval 2021 task 1 competition, BERT and RoBERTa were two contextual pretrained embeddings that managed to get the best performance on both subtasks. The research in this final project focuses on adding word and sentence features to the contextual pretrained embedding-based model and the static embedding-based model to improve performance from the previous competition. Based on the experiments conducted, the use of word and sentence features is proven to improve the performance of the model and the results of stacking. The results of the best stacking model managed to rank first in subtask 1 with a Pearson value of 0.7887. In subtask 2, managed to rank second with a Pearson score of 0.8590. Based on further analysis, the characteristics of the built model tend to predict the complexity of words or phrases that are rarely used higher than words or phrases that are often used.. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	The complexity of words or phrases in a sentence is one way of knowing the literacy level of the reading text. Information about the literacy level of a text can be used to determine the complexity of a corpus. The complexity of a corpus can certainly affect the performance of artificial intelligence to understand the context of a text. This final project aims to create a model that can predict the complexity value of a word (subtask 1) or a phrase (subtask 2) that appears in a sentence. In a previous study in the SemEval 2021 task 1 competition, BERT and RoBERTa were two contextual pretrained embeddings that managed to get the best performance on both subtasks. The research in this final project focuses on adding word and sentence features to the contextual pretrained embedding-based model and the static embedding-based model to improve performance from the previous competition. Based on the experiments conducted, the use of word and sentence features is proven to improve the performance of the model and the results of stacking. The results of the best stacking model managed to rank first in subtask 1 with a Pearson value of 0.7887. In subtask 2, managed to rank second with a Pearson score of 0.8590. Based on further analysis, the characteristics of the built model tend to predict the complexity of words or phrases that are rarely used higher than words or phrases that are often used..
format	Final Project
author	Stanley Yoga Setiawan, Stefanus
spellingShingle	Stanley Yoga Setiawan, Stefanus COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.
author_facet	Stanley Yoga Setiawan, Stefanus
author_sort	Stanley Yoga Setiawan, Stefanus
title	COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.
title_short	COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.
title_full	COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.
title_fullStr	COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.
title_full_unstemmed	COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.
title_sort	complexity, word features, sentence features, bert, roberta, xlnet, stacking.
url	https://digilib.itb.ac.id/gdl/view/66605
_version_	1822277671338377216

COMPLEXITY, WORD FEATURES, SENTENCE FEATURES, BERT, ROBERTA, XLNET, STACKING.

Similar Items