RELIGIOUS DOMAIN-INDONESIAN SIRAH NABAWIYAH QUESTION ANSWERING USING GENERATIVE-LLM

Indonesia is a country with the largest population of Muslims in the world. There are two main sources of information in Islam, the Holy Qur'an and the Book of Hadith, in addition, Sirah Nabawiyah is other important literature. The Sirah Nabawiyah is a historical literature on the prophetic...

Full description

Saved in:
Bibliographic Details
Main Author: Razif Rizqullah, Muhammad
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/80969
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Indonesia is a country with the largest population of Muslims in the world. There are two main sources of information in Islam, the Holy Qur'an and the Book of Hadith, in addition, Sirah Nabawiyah is other important literature. The Sirah Nabawiyah is a historical literature on the prophetic journey and biography in Islam that refers to the two main sources. In current Question Answering (QA) research, there have been studies on the Quran and the Hadith, but none have used the Sirah Nabawiyah, especially for the Indonesian language. We use Sirah Nabawiyah literature to build a novel dataset for QA. Manually building a new dataset requires a lot of human effort and cost, so Generative-LLM was used to assist in some parts of the process. The result is the Question Answering Sirah Nabawiyah (QASiNa) dataset for reading comprehension (QASiNa-RC), multiple choices (QASiNa-MC), and Sirah Nabawiyah corpus (SiNaCorpus). QASiNa-RC testing was conducted for reading comprehension task using mBERT, XLM-RoBERTa, and IndoBERT. QASiNa-MC testing was conducted for multiple choices QA tasks using open-source Generative-LLMs, namely mGPT, XGLM, BLOOM, and BLOOMZ. Furthermore, GPT-3.5 and GPT-4 were also used to test both datasets. The evaluation results of QASiNa-RC showed XLM-RoBERTa as the best model with an EM value of 58.40%, while the GPT-3.5 and GPT-4 models made excessive interpretations. The evaluation of QASiNa-MC showed BLOOMZ 1.7B as the best model with an accuracy of 27.76% and increased to 28.62% after corpus-tuning. The GPT-3.5 and GPT-4 models achieved better results with accuracy 56.60% and 72.40% respectively.