DEVELOPMENT OF NATURAL LANGUAGE GENERATION AND PARAPHRASE FOR GENERATION OF COMPETENCY CLUSTER DYNAMICS REPORTS

Assessment Center is an assessment tool that is carried out using multi-simulation, multi-assessors, and an assessment aggregation process that are all mandatory in the implementation of the assessment. The current condition is that the Competency Cluster Dynamics Report, one of the components of...

Full description

Saved in:
Bibliographic Details
Main Author: Aris Setiawan, Alfan
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/87722
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Assessment Center is an assessment tool that is carried out using multi-simulation, multi-assessors, and an assessment aggregation process that are all mandatory in the implementation of the assessment. The current condition is that the Competency Cluster Dynamics Report, one of the components of the assessment report, is made by the assessor. This causes the assessor to need more time to carry out the task. With the increasing demand for the implementation of the Assessment Center and the limited number of assessors, efforts need to be made so that the productivity of the implementation of the Assessment Center can be increased. This study proposes the use of Natural Language Generation (NLG) as an alternative solution for creating the Competency Cluster Dynamics Report. NLG is a sub-task in Natural Language Processing (NLP) for creating natural language text from structured data, which is the data source used by the assessor in creating the report. This study uses 3 approaches in creating the NLG model. The template-based model can create paragraphs directly from tabular data, the model is created by defining a sentence frame and then filling the frame with relevant information. Then the Data-to-Text approach is also carried out by transforming tabular data into a flat string format (linearization) as a model input used to train the pre-trained language model. Finally, the text-to-text approach which is a paraphrase model by training the pre-trained language model from input in the form of text data using the output from the template-based as a model input. In its implementation, the template-based model can be directly implemented on data, but the model creation process takes quite a long time because it is made manually. The data-to-text approach is relatively easier to implement, meanwhile, the text-to-text model requires a pipeline so that it can be used as a processor of tabular data into text data so that the model can work. Both methods require a lot of data, so augmentation is carried out to create synthetic data with the aim that the model can work better. After conducting quantitative and qualitative evaluations, the text-to-text model became the model with the best report output. The results of the qualitative evaluation by the assessor showed that the text-to-text model produced reports that v were considered better than other approaches. In terms of fluency (score 3.60) the model got the same score as the original report by human (score 3.60), in addition for the faithfulness category (score 3.31) and coherence (score 3.60) it could exceed the evaluation score of the original report by human (faithfulness score 3.23 and coherence 3.53). While template-based (score 3.33) outperformed reports by humans and text-to-text models in the faithfulness category. Meanwhile, the Datato- Text approach got the smallest evaluation score of all models.