RELIGIOUS DOMAIN-INDONESIAN SIRAH NABAWIYAH QUESTION ANSWERING USING GENERATIVE-LLM

Indonesia is a country with the largest population of Muslims in the world. There are two main sources of information in Islam, the Holy Qur'an and the Book of Hadith, in addition, Sirah Nabawiyah is other important literature. The Sirah Nabawiyah is a historical literature on the prophetic...

Full description

Saved in:

Bibliographic Details
Main Author:	Razif Rizqullah, Muhammad
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/80969
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:80969
spelling	id-itb.:809692024-03-16T12:20:44ZRELIGIOUS DOMAIN-INDONESIAN SIRAH NABAWIYAH QUESTION ANSWERING USING GENERATIVE-LLM Razif Rizqullah, Muhammad Indonesia Theses QASiNa, reading comprehension, multiple choices, Masked-LM, Generative-LLM INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/80969 Indonesia is a country with the largest population of Muslims in the world. There are two main sources of information in Islam, the Holy Qur'an and the Book of Hadith, in addition, Sirah Nabawiyah is other important literature. The Sirah Nabawiyah is a historical literature on the prophetic journey and biography in Islam that refers to the two main sources. In current Question Answering (QA) research, there have been studies on the Quran and the Hadith, but none have used the Sirah Nabawiyah, especially for the Indonesian language. We use Sirah Nabawiyah literature to build a novel dataset for QA. Manually building a new dataset requires a lot of human effort and cost, so Generative-LLM was used to assist in some parts of the process. The result is the Question Answering Sirah Nabawiyah (QASiNa) dataset for reading comprehension (QASiNa-RC), multiple choices (QASiNa-MC), and Sirah Nabawiyah corpus (SiNaCorpus). QASiNa-RC testing was conducted for reading comprehension task using mBERT, XLM-RoBERTa, and IndoBERT. QASiNa-MC testing was conducted for multiple choices QA tasks using open-source Generative-LLMs, namely mGPT, XGLM, BLOOM, and BLOOMZ. Furthermore, GPT-3.5 and GPT-4 were also used to test both datasets. The evaluation results of QASiNa-RC showed XLM-RoBERTa as the best model with an EM value of 58.40%, while the GPT-3.5 and GPT-4 models made excessive interpretations. The evaluation of QASiNa-MC showed BLOOMZ 1.7B as the best model with an accuracy of 27.76% and increased to 28.62% after corpus-tuning. The GPT-3.5 and GPT-4 models achieved better results with accuracy 56.60% and 72.40% respectively. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Indonesia is a country with the largest population of Muslims in the world. There are two main sources of information in Islam, the Holy Qur'an and the Book of Hadith, in addition, Sirah Nabawiyah is other important literature. The Sirah Nabawiyah is a historical literature on the prophetic journey and biography in Islam that refers to the two main sources. In current Question Answering (QA) research, there have been studies on the Quran and the Hadith, but none have used the Sirah Nabawiyah, especially for the Indonesian language. We use Sirah Nabawiyah literature to build a novel dataset for QA. Manually building a new dataset requires a lot of human effort and cost, so Generative-LLM was used to assist in some parts of the process. The result is the Question Answering Sirah Nabawiyah (QASiNa) dataset for reading comprehension (QASiNa-RC), multiple choices (QASiNa-MC), and Sirah Nabawiyah corpus (SiNaCorpus). QASiNa-RC testing was conducted for reading comprehension task using mBERT, XLM-RoBERTa, and IndoBERT. QASiNa-MC testing was conducted for multiple choices QA tasks using open-source Generative-LLMs, namely mGPT, XGLM, BLOOM, and BLOOMZ. Furthermore, GPT-3.5 and GPT-4 were also used to test both datasets. The evaluation results of QASiNa-RC showed XLM-RoBERTa as the best model with an EM value of 58.40%, while the GPT-3.5 and GPT-4 models made excessive interpretations. The evaluation of QASiNa-MC showed BLOOMZ 1.7B as the best model with an accuracy of 27.76% and increased to 28.62% after corpus-tuning. The GPT-3.5 and GPT-4 models achieved better results with accuracy 56.60% and 72.40% respectively.
format	Theses
author	Razif Rizqullah, Muhammad
spellingShingle	Razif Rizqullah, Muhammad RELIGIOUS DOMAIN-INDONESIAN SIRAH NABAWIYAH QUESTION ANSWERING USING GENERATIVE-LLM
author_facet	Razif Rizqullah, Muhammad
author_sort	Razif Rizqullah, Muhammad
title	RELIGIOUS DOMAIN-INDONESIAN SIRAH NABAWIYAH QUESTION ANSWERING USING GENERATIVE-LLM
title_short	RELIGIOUS DOMAIN-INDONESIAN SIRAH NABAWIYAH QUESTION ANSWERING USING GENERATIVE-LLM
title_full	RELIGIOUS DOMAIN-INDONESIAN SIRAH NABAWIYAH QUESTION ANSWERING USING GENERATIVE-LLM
title_fullStr	RELIGIOUS DOMAIN-INDONESIAN SIRAH NABAWIYAH QUESTION ANSWERING USING GENERATIVE-LLM
title_full_unstemmed	RELIGIOUS DOMAIN-INDONESIAN SIRAH NABAWIYAH QUESTION ANSWERING USING GENERATIVE-LLM
title_sort	religious domain-indonesian sirah nabawiyah question answering using generative-llm
url	https://digilib.itb.ac.id/gdl/view/80969
_version_	1822997061715361792

RELIGIOUS DOMAIN-INDONESIAN SIRAH NABAWIYAH QUESTION ANSWERING USING GENERATIVE-LLM

Similar Items