INDONESIAN QUESTION ANSWERING SYSTEM FOR FACTOID QUESTIONS FROM FACE BEAUTY PRODUCTS KNOWLEDGE GRAPH
Question answering (QA) is a research field in NLP. It is developed for finding the right answers from a natural language question. QA systems can be used for building chatbots or even search engines. QA system that is discussed here is the one using a knowledge graph as its data source. The idea...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/58180 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Question answering (QA) is a research field in NLP. It is developed for finding the right answers from a
natural language question. QA systems can be used for building chatbots or even search engines. QA
system that is discussed here is the one using a knowledge graph as its data source. The idea behind this
QA system is translating questions into SPARQL query. Common processes in QA systems are question
analysis, phrase mapping, disambiguation, and query construction.
The system solution consists of four modules, answer type classification module and information
extraction module which perform the question analysis process, text similarity module which performs
phrase mapping and disambiguation, and query construction module which constructs and executes query.
Experiments are performed for the answer type classification and the information extraction module to
find the best model. The answer type classification module experiment uses seven models, namely SVM
tf-idf, SVM-fastText, SVM-IndoBERT, LSTM-fastText, LSTM-IndoBERT, fine-tuning IndoBERT, and
fine-tuning IndoBERT auxiliary. The information extraction module experiment uses five models, namely
SVM-fastText, SVM-IndoBERT, LSTM-fastText, LSTM-IndoBERT, and fine-tuning IndoBERT. We use
the best model for building the QA system. The text similarity module uses lexical similarity with two
distance metrics, Jaccard and Levenshtein. The query construction module uses query templates.
Based on the experiment, the fine-tuning IndoBERT model has the best performance for answer type
classification. For information extraction, the LSTM-IndoBERT model and the fine-tuning IndoBERT
model perform equally well. The fine-tuning IndoBERT model obtains 1.00 accuracy for answer type
classification and 0.98 F1-score for information extraction. The QA system is built using the fine-tuning
model IndoBERT for answer type classification and information extraction because this model performs
well on both validation data and test data. Overall, the QA system obtains the average evaluation value of
F1-score, precision, and recall respectively 0.8499703, 0.8823529 and 0.8418301. |
---|