PHYSICS PROBLEM GENERATION THROUGH PATTERN MATCHING AND LARGE LANGUAGE MODELS

Question generation is a frequently researched area in AI for academic needs, aimed at creating natural language text questions that are semantically accurate and syntactically cohesive. This generation can be used to create a variety of questions to reduce the number of cheating by committed by...

Full description

Saved in:

Bibliographic Details
Main Author:	Marchotridyo
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/82425
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:82425
spelling	id-itb.:824252024-07-08T11:30:45ZPHYSICS PROBLEM GENERATION THROUGH PATTERN MATCHING AND LARGE LANGUAGE MODELS Marchotridyo Indonesia Final Project question generation, data structure, pattern matching, language model, paraphrase INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/82425 Question generation is a frequently researched area in AI for academic needs, aimed at creating natural language text questions that are semantically accurate and syntactically cohesive. This generation can be used to create a variety of questions to reduce the number of cheating by committed by students. This thesis investigates how to generate physics questions. Physics questions are chosen because previous research has not addressed them. Additionally, generating physics questions involves not only generating numbers but also the question text. There are two main processes involved in generating physics questions: generating variables in the questions (in the form of numbers) and paraphrasing the generated questions. The question generation process begins by creating a data structure to represent the content of a question, including its text, variables, answers, and explanations. Variables in questions are identified using regular expression-based pattern matching and then filled with random values when the question is generated. Random value assignment follows rules defined for these variables. Once the questions are generated, they are paraphrased using various large language models (LLMs), namely Pegasus and T5 for fine-tuned models, and ChatGPT-3.5 Turbo and Mistral 7B for directly prompted models. The paraphrasing performance of each model is compared using several automatic paraphrase evaluation metrics, including n-gram based metrics like BLEU, METEOR, and ROUGE, a language model-based automatic evaluation method called ParaScore, and human evaluation. The results of this thesis indicate that the LLMs, specifically ChatGPT-3.5 Turbo and Mistral 7B, are highly effective at paraphrasing questions based on human evaluation. The research also shows that n-gram based automatic evaluation metrics like BLEU, METEOR, and ROUGE are insufficient for evaluating the complexity of paraphrasing results, whereas the language model-based automatic evaluation metric, ParaScore, aligns well with human evaluation results. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Question generation is a frequently researched area in AI for academic needs, aimed at creating natural language text questions that are semantically accurate and syntactically cohesive. This generation can be used to create a variety of questions to reduce the number of cheating by committed by students. This thesis investigates how to generate physics questions. Physics questions are chosen because previous research has not addressed them. Additionally, generating physics questions involves not only generating numbers but also the question text. There are two main processes involved in generating physics questions: generating variables in the questions (in the form of numbers) and paraphrasing the generated questions. The question generation process begins by creating a data structure to represent the content of a question, including its text, variables, answers, and explanations. Variables in questions are identified using regular expression-based pattern matching and then filled with random values when the question is generated. Random value assignment follows rules defined for these variables. Once the questions are generated, they are paraphrased using various large language models (LLMs), namely Pegasus and T5 for fine-tuned models, and ChatGPT-3.5 Turbo and Mistral 7B for directly prompted models. The paraphrasing performance of each model is compared using several automatic paraphrase evaluation metrics, including n-gram based metrics like BLEU, METEOR, and ROUGE, a language model-based automatic evaluation method called ParaScore, and human evaluation. The results of this thesis indicate that the LLMs, specifically ChatGPT-3.5 Turbo and Mistral 7B, are highly effective at paraphrasing questions based on human evaluation. The research also shows that n-gram based automatic evaluation metrics like BLEU, METEOR, and ROUGE are insufficient for evaluating the complexity of paraphrasing results, whereas the language model-based automatic evaluation metric, ParaScore, aligns well with human evaluation results.
format	Final Project
author	Marchotridyo
spellingShingle	Marchotridyo PHYSICS PROBLEM GENERATION THROUGH PATTERN MATCHING AND LARGE LANGUAGE MODELS
author_facet	Marchotridyo
author_sort	Marchotridyo
title	PHYSICS PROBLEM GENERATION THROUGH PATTERN MATCHING AND LARGE LANGUAGE MODELS
title_short	PHYSICS PROBLEM GENERATION THROUGH PATTERN MATCHING AND LARGE LANGUAGE MODELS
title_full	PHYSICS PROBLEM GENERATION THROUGH PATTERN MATCHING AND LARGE LANGUAGE MODELS
title_fullStr	PHYSICS PROBLEM GENERATION THROUGH PATTERN MATCHING AND LARGE LANGUAGE MODELS
title_full_unstemmed	PHYSICS PROBLEM GENERATION THROUGH PATTERN MATCHING AND LARGE LANGUAGE MODELS
title_sort	physics problem generation through pattern matching and large language models
url	https://digilib.itb.ac.id/gdl/view/82425
_version_	1822282224721985536

PHYSICS PROBLEM GENERATION THROUGH PATTERN MATCHING AND LARGE LANGUAGE MODELS

Similar Items