BUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET
A dataset is essential thing for training and testing Question Answering (QA) system, which can be applied to product review data. Reviews significantly impact to purchasing decision, yet there is currently no publicly available Indonesian QA dataset for product reviews. This research aims is to cre...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/87585 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:87585 |
---|---|
spelling |
id-itb.:875852025-01-31T10:54:34ZBUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET Prisha W, Made Indonesia Final Project dataset, question answering, crowdsourcing, model, logistic regression, decision tree, support vector machine, random forest, generative pre-trainedtransformer, exact match, F1 score, bilingual evaluation understudy. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/87585 A dataset is essential thing for training and testing Question Answering (QA) system, which can be applied to product review data. Reviews significantly impact to purchasing decision, yet there is currently no publicly available Indonesian QA dataset for product reviews. This research aims is to create a dataset for system QA specifically with that task. Product reviews were curated from Sociollq and Female Daily, while questions and answers were collected through crowdsourcing from 15 participants. The final dataset consists of 3.000 samples, with 2.400 used for training and 600 for testing. Several machine learning models, including Logistic Regression, Decision Tree, Support Vector Machine (SVM), Random Forest, and Generative Pre-trained Transformer (GPT), were used for answer prediction. Model performance was assessed using Exact Match (EM), F1 score, and Bilingual Evaluation Understudy (BLEU). Among the tested models, GPT achieved the highest performance, with EM, F1, and BLEU scores of 0,2667, 0,6694, and 0,0014, on the test set. The dataset was then structured for QA system development. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
A dataset is essential thing for training and testing Question Answering (QA) system, which can be applied to product review data. Reviews significantly impact to purchasing decision, yet there is currently no publicly available Indonesian QA dataset for product reviews. This research aims is to create a dataset for system QA specifically with that task. Product reviews were curated from Sociollq and Female Daily, while questions and answers were collected through crowdsourcing from 15 participants. The final dataset consists of 3.000 samples, with 2.400 used for training and 600 for testing. Several machine learning models, including Logistic Regression, Decision Tree, Support Vector Machine (SVM), Random Forest, and Generative Pre-trained Transformer (GPT), were used for answer prediction. Model performance was assessed using Exact Match (EM), F1 score, and Bilingual Evaluation Understudy (BLEU). Among the tested models, GPT achieved the highest performance, with EM, F1, and BLEU scores of 0,2667, 0,6694, and 0,0014, on the test set. The dataset was then structured for QA system development. |
format |
Final Project |
author |
Prisha W, Made |
spellingShingle |
Prisha W, Made BUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET |
author_facet |
Prisha W, Made |
author_sort |
Prisha W, Made |
title |
BUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET |
title_short |
BUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET |
title_full |
BUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET |
title_fullStr |
BUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET |
title_full_unstemmed |
BUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET |
title_sort |
building an indonesian product review question answering dataset |
url |
https://digilib.itb.ac.id/gdl/view/87585 |
_version_ |
1823000101257216000 |