DEVELOPMENT OF QUESTION ANSWERING SYSTEM ON AL-QUR'AN TRANSLATION USING LARGE LANGUAGE MODEL
This research aims to develop a Question Answering (QA) System on Al-Qur'an translation using Large Language Model (LLM). This system is designed to facilitate the understanding of the Holy Qur'an, especially for new converts to Islam. In the context of Indonesia, as a country with the...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/86386 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:86386 |
---|---|
spelling |
id-itb.:863862024-09-18T08:04:29ZDEVELOPMENT OF QUESTION ANSWERING SYSTEM ON AL-QUR'AN TRANSLATION USING LARGE LANGUAGE MODEL Restu Maulana, Diky Indonesia Final Project Question Answering System, Al-Qur’an, Large Language Model, Retrieval-Augmented Generation, IndoBERT, GPT, IndoQRCD, Qur’an QA INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/86386 This research aims to develop a Question Answering (QA) System on Al-Qur'an translation using Large Language Model (LLM). This system is designed to facilitate the understanding of the Holy Qur'an, especially for new converts to Islam. In the context of Indonesia, as a country with the largest muslim population in the world, there is a need for a system that can answer questions about Islamic knowledge contained in the Holy Qur'an. The research process involved creating the IndoQRCD dataset which is a translation of the Qur'anic Reading Comprehension Dataset (QRCD) into Indonesian. This dataset was used to perform fine-tuning on two pre-trained models, namely XLM- RoBERTa and IndoBERT. Test results with exact match and F1 score metrics show IndoBERT is better at producing the right answer based on the given context. QA system is built with Retrieval-Augmented Generation (RAG) architecture and named Qur'an QA. Vector store is created as a knowledge base and also acts as a retriever that is able to search with similarity search algorithm. Qur'an QA is able to perform two types of QA, namely extractive and generative. The extractive QA generator uses fine-tuned IndoBERT. Meanwhile, GPT-4 is chosen as the generator for generative QA. Qur'an QA is able to receive input in the form of questions about Islam in Indonesian. Then, the system provides answers based on the context given in the form of Qur'anic verse quotations. Test results with answer relevancy and faithfulness metrics show that generative QA is better than extractive QA in generating relevant answers and minimizing hallucinations. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
This research aims to develop a Question Answering (QA) System on Al-Qur'an
translation using Large Language Model (LLM). This system is designed to
facilitate the understanding of the Holy Qur'an, especially for new converts to
Islam. In the context of Indonesia, as a country with the largest muslim population
in the world, there is a need for a system that can answer questions about Islamic
knowledge contained in the Holy Qur'an.
The research process involved creating the IndoQRCD dataset which is a translation
of the Qur'anic Reading Comprehension Dataset (QRCD) into Indonesian. This
dataset was used to perform fine-tuning on two pre-trained models, namely XLM-
RoBERTa and IndoBERT. Test results with exact match and F1 score metrics show
IndoBERT is better at producing the right answer based on the given context.
QA system is built with Retrieval-Augmented Generation (RAG) architecture and
named Qur'an QA. Vector store is created as a knowledge base and also acts as a
retriever that is able to search with similarity search algorithm. Qur'an QA is able
to perform two types of QA, namely extractive and generative. The extractive QA
generator uses fine-tuned IndoBERT. Meanwhile, GPT-4 is chosen as the generator
for generative QA.
Qur'an QA is able to receive input in the form of questions about Islam in
Indonesian. Then, the system provides answers based on the context given in the
form of Qur'anic verse quotations. Test results with answer relevancy and
faithfulness metrics show that generative QA is better than extractive QA in
generating relevant answers and minimizing hallucinations. |
format |
Final Project |
author |
Restu Maulana, Diky |
spellingShingle |
Restu Maulana, Diky DEVELOPMENT OF QUESTION ANSWERING SYSTEM ON AL-QUR'AN TRANSLATION USING LARGE LANGUAGE MODEL |
author_facet |
Restu Maulana, Diky |
author_sort |
Restu Maulana, Diky |
title |
DEVELOPMENT OF QUESTION ANSWERING SYSTEM ON AL-QUR'AN TRANSLATION USING LARGE LANGUAGE MODEL |
title_short |
DEVELOPMENT OF QUESTION ANSWERING SYSTEM ON AL-QUR'AN TRANSLATION USING LARGE LANGUAGE MODEL |
title_full |
DEVELOPMENT OF QUESTION ANSWERING SYSTEM ON AL-QUR'AN TRANSLATION USING LARGE LANGUAGE MODEL |
title_fullStr |
DEVELOPMENT OF QUESTION ANSWERING SYSTEM ON AL-QUR'AN TRANSLATION USING LARGE LANGUAGE MODEL |
title_full_unstemmed |
DEVELOPMENT OF QUESTION ANSWERING SYSTEM ON AL-QUR'AN TRANSLATION USING LARGE LANGUAGE MODEL |
title_sort |
development of question answering system on al-qur'an translation using large language model |
url |
https://digilib.itb.ac.id/gdl/view/86386 |
_version_ |
1822999528170586112 |