RELIGIOUS DOMAIN-INDONESIAN SIRAH NABAWIYAH QUESTION ANSWERING USING GENERATIVE-LLM
Indonesia is a country with the largest population of Muslims in the world. There are two main sources of information in Islam, the Holy Qur'an and the Book of Hadith, in addition, Sirah Nabawiyah is other important literature. The Sirah Nabawiyah is a historical literature on the prophetic...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/80969 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Indonesia is a country with the largest population of Muslims in the world. There
are two main sources of information in Islam, the Holy Qur'an and the Book of
Hadith, in addition, Sirah Nabawiyah is other important literature. The Sirah
Nabawiyah is a historical literature on the prophetic journey and biography in
Islam that refers to the two main sources. In current Question Answering (QA)
research, there have been studies on the Quran and the Hadith, but none have used
the Sirah Nabawiyah, especially for the Indonesian language.
We use Sirah Nabawiyah literature to build a novel dataset for QA. Manually
building a new dataset requires a lot of human effort and cost, so Generative-LLM
was used to assist in some parts of the process. The result is the Question Answering
Sirah Nabawiyah (QASiNa) dataset for reading comprehension (QASiNa-RC),
multiple choices (QASiNa-MC), and Sirah Nabawiyah corpus (SiNaCorpus).
QASiNa-RC testing was conducted for reading comprehension task using mBERT,
XLM-RoBERTa, and IndoBERT. QASiNa-MC testing was conducted for multiple
choices QA tasks using open-source Generative-LLMs, namely mGPT, XGLM,
BLOOM, and BLOOMZ. Furthermore, GPT-3.5 and GPT-4 were also used to test
both datasets.
The evaluation results of QASiNa-RC showed XLM-RoBERTa as the best model
with an EM value of 58.40%, while the GPT-3.5 and GPT-4 models made excessive
interpretations. The evaluation of QASiNa-MC showed BLOOMZ 1.7B as the best
model with an accuracy of 27.76% and increased to 28.62% after corpus-tuning.
The GPT-3.5 and GPT-4 models achieved better results with accuracy 56.60% and
72.40% respectively. |
---|