DEVELOPMENT OF NATURAL LANGUAGE GENERATION AND PARAPHRASE FOR GENERATION OF COMPETENCY CLUSTER DYNAMICS REPORTS
Assessment Center is an assessment tool that is carried out using multi-simulation, multi-assessors, and an assessment aggregation process that are all mandatory in the implementation of the assessment. The current condition is that the Competency Cluster Dynamics Report, one of the components of...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/87722 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:87722 |
---|---|
spelling |
id-itb.:877222025-02-03T07:48:07ZDEVELOPMENT OF NATURAL LANGUAGE GENERATION AND PARAPHRASE FOR GENERATION OF COMPETENCY CLUSTER DYNAMICS REPORTS Aris Setiawan, Alfan Indonesia Theses Natural Language Generation (NLG), Pre-Trained Language Model (PLM), Assessment Center, Template-Based, Text-to-Text, Data-to-Text INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/87722 Assessment Center is an assessment tool that is carried out using multi-simulation, multi-assessors, and an assessment aggregation process that are all mandatory in the implementation of the assessment. The current condition is that the Competency Cluster Dynamics Report, one of the components of the assessment report, is made by the assessor. This causes the assessor to need more time to carry out the task. With the increasing demand for the implementation of the Assessment Center and the limited number of assessors, efforts need to be made so that the productivity of the implementation of the Assessment Center can be increased. This study proposes the use of Natural Language Generation (NLG) as an alternative solution for creating the Competency Cluster Dynamics Report. NLG is a sub-task in Natural Language Processing (NLP) for creating natural language text from structured data, which is the data source used by the assessor in creating the report. This study uses 3 approaches in creating the NLG model. The template-based model can create paragraphs directly from tabular data, the model is created by defining a sentence frame and then filling the frame with relevant information. Then the Data-to-Text approach is also carried out by transforming tabular data into a flat string format (linearization) as a model input used to train the pre-trained language model. Finally, the text-to-text approach which is a paraphrase model by training the pre-trained language model from input in the form of text data using the output from the template-based as a model input. In its implementation, the template-based model can be directly implemented on data, but the model creation process takes quite a long time because it is made manually. The data-to-text approach is relatively easier to implement, meanwhile, the text-to-text model requires a pipeline so that it can be used as a processor of tabular data into text data so that the model can work. Both methods require a lot of data, so augmentation is carried out to create synthetic data with the aim that the model can work better. After conducting quantitative and qualitative evaluations, the text-to-text model became the model with the best report output. The results of the qualitative evaluation by the assessor showed that the text-to-text model produced reports that v were considered better than other approaches. In terms of fluency (score 3.60) the model got the same score as the original report by human (score 3.60), in addition for the faithfulness category (score 3.31) and coherence (score 3.60) it could exceed the evaluation score of the original report by human (faithfulness score 3.23 and coherence 3.53). While template-based (score 3.33) outperformed reports by humans and text-to-text models in the faithfulness category. Meanwhile, the Datato- Text approach got the smallest evaluation score of all models. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Assessment Center is an assessment tool that is carried out using multi-simulation,
multi-assessors, and an assessment aggregation process that are all mandatory in
the implementation of the assessment. The current condition is that the Competency
Cluster Dynamics Report, one of the components of the assessment report, is made
by the assessor. This causes the assessor to need more time to carry out the task.
With the increasing demand for the implementation of the Assessment Center and
the limited number of assessors, efforts need to be made so that the productivity of
the implementation of the Assessment Center can be increased. This study proposes
the use of Natural Language Generation (NLG) as an alternative solution for
creating the Competency Cluster Dynamics Report. NLG is a sub-task in Natural
Language Processing (NLP) for creating natural language text from structured
data, which is the data source used by the assessor in creating the report.
This study uses 3 approaches in creating the NLG model. The template-based model
can create paragraphs directly from tabular data, the model is created by defining
a sentence frame and then filling the frame with relevant information. Then the
Data-to-Text approach is also carried out by transforming tabular data into a flat
string format (linearization) as a model input used to train the pre-trained language
model. Finally, the text-to-text approach which is a paraphrase model by training
the pre-trained language model from input in the form of text data using the output
from the template-based as a model input. In its implementation, the template-based
model can be directly implemented on data, but the model creation process takes
quite a long time because it is made manually. The data-to-text approach is
relatively easier to implement, meanwhile, the text-to-text model requires a pipeline
so that it can be used as a processor of tabular data into text data so that the model
can work. Both methods require a lot of data, so augmentation is carried out to
create synthetic data with the aim that the model can work better.
After conducting quantitative and qualitative evaluations, the text-to-text model
became the model with the best report output. The results of the qualitative
evaluation by the assessor showed that the text-to-text model produced reports that
v
were considered better than other approaches. In terms of fluency (score 3.60) the
model got the same score as the original report by human (score 3.60), in addition
for the faithfulness category (score 3.31) and coherence (score 3.60) it could exceed
the evaluation score of the original report by human (faithfulness score 3.23 and
coherence 3.53). While template-based (score 3.33) outperformed reports by
humans and text-to-text models in the faithfulness category. Meanwhile, the Datato-
Text approach got the smallest evaluation score of all models. |
format |
Theses |
author |
Aris Setiawan, Alfan |
spellingShingle |
Aris Setiawan, Alfan DEVELOPMENT OF NATURAL LANGUAGE GENERATION AND PARAPHRASE FOR GENERATION OF COMPETENCY CLUSTER DYNAMICS REPORTS |
author_facet |
Aris Setiawan, Alfan |
author_sort |
Aris Setiawan, Alfan |
title |
DEVELOPMENT OF NATURAL LANGUAGE GENERATION AND PARAPHRASE FOR GENERATION OF COMPETENCY CLUSTER DYNAMICS REPORTS |
title_short |
DEVELOPMENT OF NATURAL LANGUAGE GENERATION AND PARAPHRASE FOR GENERATION OF COMPETENCY CLUSTER DYNAMICS REPORTS |
title_full |
DEVELOPMENT OF NATURAL LANGUAGE GENERATION AND PARAPHRASE FOR GENERATION OF COMPETENCY CLUSTER DYNAMICS REPORTS |
title_fullStr |
DEVELOPMENT OF NATURAL LANGUAGE GENERATION AND PARAPHRASE FOR GENERATION OF COMPETENCY CLUSTER DYNAMICS REPORTS |
title_full_unstemmed |
DEVELOPMENT OF NATURAL LANGUAGE GENERATION AND PARAPHRASE FOR GENERATION OF COMPETENCY CLUSTER DYNAMICS REPORTS |
title_sort |
development of natural language generation and paraphrase for generation of competency cluster dynamics reports |
url |
https://digilib.itb.ac.id/gdl/view/87722 |
_version_ |
1823658250902437888 |