Automatic question generation from natural language texts

Question generation (QG) is defined as the task of generating questions automatically from a variety of inputs, ranging from database information to raw text. QG system has been considered as a critical component for numerous applications, which include but are not limited to information-seeking sys...

Full description

Saved in:

Bibliographic Details
Main Author:	Cao, Zhen
Other Authors:	Andy Khong W H
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Electrical and electronic engineering
Online Access:	https://hdl.handle.net/10356/164625
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-164625
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering
spellingShingle	Engineering::Electrical and electronic engineering Cao, Zhen Automatic question generation from natural language texts
description	Question generation (QG) is defined as the task of generating questions automatically from a variety of inputs, ranging from database information to raw text. QG system has been considered as a critical component for numerous applications, which include but are not limited to information-seeking systems, multi-modal conversations, and intelligent tutoring and computer-assisted learning systems. Among these, generating questions from natural language texts has great practical significance. In education, forming good questions is crucial for improving academic performance and evaluating students' abilities. Meanwhile, texts with potential educational value, such as new articles and Wikipedia, are good reading resources for learning purposes. QG can generate questions for reading comprehension and serve as a component in intelligent tutoring systems. Benefiting from QG, the manpower and cost of creating large-scale data sets for question answering (QA) and reading comprehension can be reduced. QG is an essential feature for the chatbot in the dialog system when initiating a conversation or seeking specific information from users. Therefore, developing an automatic QG system is important and timely. The thesis focuses on QG from texts automatically. The goal is to develop models which input from the text and the output is generated questions for downstream applications. Inspired by advances in deep learning, several neural QG models are proposed and studied, addressing different challenges in QG in this thesis. One of the main challenges faced by existing neural QG models is the degradation in performance due to the issue of one-to-many mapping in the nature of QG, where, given a passage, a number of questions with variations can be generated. First, the answer as direction information (DI) is included in the RNN-based sequence-to-sequence (seq2seq) model, so as to allow the neural model to consider what to ask, which makes the model generate pertinent questions. A dual-encoder model is developed to learn the source input text and the DI representation, respectively. Therefore, an answer-aware QG model is proposed to alleviate the problem of one-to-many mapping. The nature of one-to-many mapping in QG is mainly due to two aspects of QG–what to ask and how to ask, in which the former is associated with generating to-the-point questions while the latter is with the content selection from the source to form specific questions. To further cope with the one-to-many mapping challenge, a controllable QG (CQG) model that employs an attentive seq2seq based generative model with a copying mechanism is investigated. The proposed CQG incorporates query interest and auxiliary information as controllers to address the one-to-many mapping problem in QG. Two variants of embedding strategies are designed for CQG to obtain good performance. Furthermore, multi-hop QG, which requires more complex reasoning of the input texts over multiple pieces of information, is studied. To capture the global context and facilitate reasoning, a novel framework that includes the semantic graph of the input document, where the graph can be regarded as auxiliary information, is proposed. The model encodes the semantic graph for the input texts by graph learning. Thereafter, text-level and graph-level representations are fused to generate questions via a pre-trained language model.
author2	Andy Khong W H
author_facet	Andy Khong W H Cao, Zhen
format	Thesis-Doctor of Philosophy
author	Cao, Zhen
author_sort	Cao, Zhen
title	Automatic question generation from natural language texts
title_short	Automatic question generation from natural language texts
title_full	Automatic question generation from natural language texts
title_fullStr	Automatic question generation from natural language texts
title_full_unstemmed	Automatic question generation from natural language texts
title_sort	automatic question generation from natural language texts
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/164625
_version_	1759857197929463808
spelling	sg-ntu-dr.10356-1646252023-03-06T07:30:04Z Automatic question generation from natural language texts Cao, Zhen Andy Khong W H School of Electrical and Electronic Engineering AndyKhong@ntu.edu.sg Engineering::Electrical and electronic engineering Question generation (QG) is defined as the task of generating questions automatically from a variety of inputs, ranging from database information to raw text. QG system has been considered as a critical component for numerous applications, which include but are not limited to information-seeking systems, multi-modal conversations, and intelligent tutoring and computer-assisted learning systems. Among these, generating questions from natural language texts has great practical significance. In education, forming good questions is crucial for improving academic performance and evaluating students' abilities. Meanwhile, texts with potential educational value, such as new articles and Wikipedia, are good reading resources for learning purposes. QG can generate questions for reading comprehension and serve as a component in intelligent tutoring systems. Benefiting from QG, the manpower and cost of creating large-scale data sets for question answering (QA) and reading comprehension can be reduced. QG is an essential feature for the chatbot in the dialog system when initiating a conversation or seeking specific information from users. Therefore, developing an automatic QG system is important and timely. The thesis focuses on QG from texts automatically. The goal is to develop models which input from the text and the output is generated questions for downstream applications. Inspired by advances in deep learning, several neural QG models are proposed and studied, addressing different challenges in QG in this thesis. One of the main challenges faced by existing neural QG models is the degradation in performance due to the issue of one-to-many mapping in the nature of QG, where, given a passage, a number of questions with variations can be generated. First, the answer as direction information (DI) is included in the RNN-based sequence-to-sequence (seq2seq) model, so as to allow the neural model to consider what to ask, which makes the model generate pertinent questions. A dual-encoder model is developed to learn the source input text and the DI representation, respectively. Therefore, an answer-aware QG model is proposed to alleviate the problem of one-to-many mapping. The nature of one-to-many mapping in QG is mainly due to two aspects of QG–what to ask and how to ask, in which the former is associated with generating to-the-point questions while the latter is with the content selection from the source to form specific questions. To further cope with the one-to-many mapping challenge, a controllable QG (CQG) model that employs an attentive seq2seq based generative model with a copying mechanism is investigated. The proposed CQG incorporates query interest and auxiliary information as controllers to address the one-to-many mapping problem in QG. Two variants of embedding strategies are designed for CQG to obtain good performance. Furthermore, multi-hop QG, which requires more complex reasoning of the input texts over multiple pieces of information, is studied. To capture the global context and facilitate reasoning, a novel framework that includes the semantic graph of the input document, where the graph can be regarded as auxiliary information, is proposed. The model encodes the semantic graph for the input texts by graph learning. Thereafter, text-level and graph-level representations are fused to generate questions via a pre-trained language model. Doctor of Philosophy 2023-02-07T07:21:52Z 2023-02-07T07:21:52Z 2022 Thesis-Doctor of Philosophy Cao, Z. (2022). Automatic question generation from natural language texts. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164625 https://hdl.handle.net/10356/164625 10.32657/10356/164625 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Automatic question generation from natural language texts

Similar Items