BUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET

A dataset is essential thing for training and testing Question Answering (QA) system, which can be applied to product review data. Reviews significantly impact to purchasing decision, yet there is currently no publicly available Indonesian QA dataset for product reviews. This research aims is to cre...

Full description

Saved in:
Bibliographic Details
Main Author: Prisha W, Made
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/87585
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:87585
spelling id-itb.:875852025-01-31T10:54:34ZBUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET Prisha W, Made Indonesia Final Project dataset, question answering, crowdsourcing, model, logistic regression, decision tree, support vector machine, random forest, generative pre-trainedtransformer, exact match, F1 score, bilingual evaluation understudy. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/87585 A dataset is essential thing for training and testing Question Answering (QA) system, which can be applied to product review data. Reviews significantly impact to purchasing decision, yet there is currently no publicly available Indonesian QA dataset for product reviews. This research aims is to create a dataset for system QA specifically with that task. Product reviews were curated from Sociollq and Female Daily, while questions and answers were collected through crowdsourcing from 15 participants. The final dataset consists of 3.000 samples, with 2.400 used for training and 600 for testing. Several machine learning models, including Logistic Regression, Decision Tree, Support Vector Machine (SVM), Random Forest, and Generative Pre-trained Transformer (GPT), were used for answer prediction. Model performance was assessed using Exact Match (EM), F1 score, and Bilingual Evaluation Understudy (BLEU). Among the tested models, GPT achieved the highest performance, with EM, F1, and BLEU scores of 0,2667, 0,6694, and 0,0014, on the test set. The dataset was then structured for QA system development. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description A dataset is essential thing for training and testing Question Answering (QA) system, which can be applied to product review data. Reviews significantly impact to purchasing decision, yet there is currently no publicly available Indonesian QA dataset for product reviews. This research aims is to create a dataset for system QA specifically with that task. Product reviews were curated from Sociollq and Female Daily, while questions and answers were collected through crowdsourcing from 15 participants. The final dataset consists of 3.000 samples, with 2.400 used for training and 600 for testing. Several machine learning models, including Logistic Regression, Decision Tree, Support Vector Machine (SVM), Random Forest, and Generative Pre-trained Transformer (GPT), were used for answer prediction. Model performance was assessed using Exact Match (EM), F1 score, and Bilingual Evaluation Understudy (BLEU). Among the tested models, GPT achieved the highest performance, with EM, F1, and BLEU scores of 0,2667, 0,6694, and 0,0014, on the test set. The dataset was then structured for QA system development.
format Final Project
author Prisha W, Made
spellingShingle Prisha W, Made
BUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET
author_facet Prisha W, Made
author_sort Prisha W, Made
title BUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET
title_short BUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET
title_full BUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET
title_fullStr BUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET
title_full_unstemmed BUILDING AN INDONESIAN PRODUCT REVIEW QUESTION ANSWERING DATASET
title_sort building an indonesian product review question answering dataset
url https://digilib.itb.ac.id/gdl/view/87585
_version_ 1823000101257216000