BUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET

A dataset is essential thing for training and testing Question Answering (QA) system, which can be applied to product review data. Reviews significantly impact to purchasing decision, yet there is currently no publicly available Indonesian QA dataset for product reviews. This research aims is to cre...

Full description

Saved in:
Bibliographic Details
Main Author: Prisha W, Made
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/87585
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:A dataset is essential thing for training and testing Question Answering (QA) system, which can be applied to product review data. Reviews significantly impact to purchasing decision, yet there is currently no publicly available Indonesian QA dataset for product reviews. This research aims is to create a dataset for system QA specifically with that task. Product reviews were curated from Sociollq and Female Daily, while questions and answers were collected through crowdsourcing from 15 participants. The final dataset consists of 3.000 samples, with 2.400 used for training and 600 for testing. Several machine learning models, including Logistic Regression, Decision Tree, Support Vector Machine (SVM), Random Forest, and Generative Pre-trained Transformer (GPT), were used for answer prediction. Model performance was assessed using Exact Match (EM), F1 score, and Bilingual Evaluation Understudy (BLEU). Among the tested models, GPT achieved the highest performance, with EM, F1, and BLEU scores of 0,2667, 0,6694, and 0,0014, on the test set. The dataset was then structured for QA system development.