DEVELOPMENT OF QUESTION ANSWERING SYSTEM ON AL-QUR'AN TRANSLATION USING LARGE LANGUAGE MODEL

This research aims to develop a Question Answering (QA) System on Al-Qur'an translation using Large Language Model (LLM). This system is designed to facilitate the understanding of the Holy Qur'an, especially for new converts to Islam. In the context of Indonesia, as a country with the...

Full description

Saved in:
Bibliographic Details
Main Author: Restu Maulana, Diky
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/86386
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:This research aims to develop a Question Answering (QA) System on Al-Qur'an translation using Large Language Model (LLM). This system is designed to facilitate the understanding of the Holy Qur'an, especially for new converts to Islam. In the context of Indonesia, as a country with the largest muslim population in the world, there is a need for a system that can answer questions about Islamic knowledge contained in the Holy Qur'an. The research process involved creating the IndoQRCD dataset which is a translation of the Qur'anic Reading Comprehension Dataset (QRCD) into Indonesian. This dataset was used to perform fine-tuning on two pre-trained models, namely XLM- RoBERTa and IndoBERT. Test results with exact match and F1 score metrics show IndoBERT is better at producing the right answer based on the given context. QA system is built with Retrieval-Augmented Generation (RAG) architecture and named Qur'an QA. Vector store is created as a knowledge base and also acts as a retriever that is able to search with similarity search algorithm. Qur'an QA is able to perform two types of QA, namely extractive and generative. The extractive QA generator uses fine-tuned IndoBERT. Meanwhile, GPT-4 is chosen as the generator for generative QA. Qur'an QA is able to receive input in the form of questions about Islam in Indonesian. Then, the system provides answers based on the context given in the form of Qur'anic verse quotations. Test results with answer relevancy and faithfulness metrics show that generative QA is better than extractive QA in generating relevant answers and minimizing hallucinations.