INDONESIAN QUESTION ANSWERING SYSTEM FOR FACTOID QUESTIONS FROM FACE BEAUTY PRODUCTS KNOWLEDGE GRAPH

Question answering (QA) is a research field in NLP. It is developed for finding the right answers from a natural language question. QA systems can be used for building chatbots or even search engines. QA system that is discussed here is the one using a knowledge graph as its data source. The idea...

Full description

Saved in:
Bibliographic Details
Main Author: Indah Rahajeng, Mahanti
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/58180
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Question answering (QA) is a research field in NLP. It is developed for finding the right answers from a natural language question. QA systems can be used for building chatbots or even search engines. QA system that is discussed here is the one using a knowledge graph as its data source. The idea behind this QA system is translating questions into SPARQL query. Common processes in QA systems are question analysis, phrase mapping, disambiguation, and query construction. The system solution consists of four modules, answer type classification module and information extraction module which perform the question analysis process, text similarity module which performs phrase mapping and disambiguation, and query construction module which constructs and executes query. Experiments are performed for the answer type classification and the information extraction module to find the best model. The answer type classification module experiment uses seven models, namely SVM tf-idf, SVM-fastText, SVM-IndoBERT, LSTM-fastText, LSTM-IndoBERT, fine-tuning IndoBERT, and fine-tuning IndoBERT auxiliary. The information extraction module experiment uses five models, namely SVM-fastText, SVM-IndoBERT, LSTM-fastText, LSTM-IndoBERT, and fine-tuning IndoBERT. We use the best model for building the QA system. The text similarity module uses lexical similarity with two distance metrics, Jaccard and Levenshtein. The query construction module uses query templates. Based on the experiment, the fine-tuning IndoBERT model has the best performance for answer type classification. For information extraction, the LSTM-IndoBERT model and the fine-tuning IndoBERT model perform equally well. The fine-tuning IndoBERT model obtains 1.00 accuracy for answer type classification and 0.98 F1-score for information extraction. The QA system is built using the fine-tuning model IndoBERT for answer type classification and information extraction because this model performs well on both validation data and test data. Overall, the QA system obtains the average evaluation value of F1-score, precision, and recall respectively 0.8499703, 0.8823529 and 0.8418301.