Automatic question generation from natural language texts

Question generation (QG) is defined as the task of generating questions automatically from a variety of inputs, ranging from database information to raw text. QG system has been considered as a critical component for numerous applications, which include but are not limited to information-seeking sys...

Full description

Saved in:
Bibliographic Details
Main Author: Cao, Zhen
Other Authors: Andy Khong W H
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/164625
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-164625
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Cao, Zhen
Automatic question generation from natural language texts
description Question generation (QG) is defined as the task of generating questions automatically from a variety of inputs, ranging from database information to raw text. QG system has been considered as a critical component for numerous applications, which include but are not limited to information-seeking systems, multi-modal conversations, and intelligent tutoring and computer-assisted learning systems. Among these, generating questions from natural language texts has great practical significance. In education, forming good questions is crucial for improving academic performance and evaluating students' abilities. Meanwhile, texts with potential educational value, such as new articles and Wikipedia, are good reading resources for learning purposes. QG can generate questions for reading comprehension and serve as a component in intelligent tutoring systems. Benefiting from QG, the manpower and cost of creating large-scale data sets for question answering (QA) and reading comprehension can be reduced. QG is an essential feature for the chatbot in the dialog system when initiating a conversation or seeking specific information from users. Therefore, developing an automatic QG system is important and timely. The thesis focuses on QG from texts automatically. The goal is to develop models which input from the text and the output is generated questions for downstream applications. Inspired by advances in deep learning, several neural QG models are proposed and studied, addressing different challenges in QG in this thesis. One of the main challenges faced by existing neural QG models is the degradation in performance due to the issue of one-to-many mapping in the nature of QG, where, given a passage, a number of questions with variations can be generated. First, the answer as direction information (DI) is included in the RNN-based sequence-to-sequence (seq2seq) model, so as to allow the neural model to consider what to ask, which makes the model generate pertinent questions. A dual-encoder model is developed to learn the source input text and the DI representation, respectively. Therefore, an answer-aware QG model is proposed to alleviate the problem of one-to-many mapping. The nature of one-to-many mapping in QG is mainly due to two aspects of QG–what to ask and how to ask, in which the former is associated with generating to-the-point questions while the latter is with the content selection from the source to form specific questions. To further cope with the one-to-many mapping challenge, a controllable QG (CQG) model that employs an attentive seq2seq based generative model with a copying mechanism is investigated. The proposed CQG incorporates query interest and auxiliary information as controllers to address the one-to-many mapping problem in QG. Two variants of embedding strategies are designed for CQG to obtain good performance. Furthermore, multi-hop QG, which requires more complex reasoning of the input texts over multiple pieces of information, is studied. To capture the global context and facilitate reasoning, a novel framework that includes the semantic graph of the input document, where the graph can be regarded as auxiliary information, is proposed. The model encodes the semantic graph for the input texts by graph learning. Thereafter, text-level and graph-level representations are fused to generate questions via a pre-trained language model.
author2 Andy Khong W H
author_facet Andy Khong W H
Cao, Zhen
format Thesis-Doctor of Philosophy
author Cao, Zhen
author_sort Cao, Zhen
title Automatic question generation from natural language texts
title_short Automatic question generation from natural language texts
title_full Automatic question generation from natural language texts
title_fullStr Automatic question generation from natural language texts
title_full_unstemmed Automatic question generation from natural language texts
title_sort automatic question generation from natural language texts
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/164625
_version_ 1759857197929463808
spelling sg-ntu-dr.10356-1646252023-03-06T07:30:04Z Automatic question generation from natural language texts Cao, Zhen Andy Khong W H School of Electrical and Electronic Engineering AndyKhong@ntu.edu.sg Engineering::Electrical and electronic engineering Question generation (QG) is defined as the task of generating questions automatically from a variety of inputs, ranging from database information to raw text. QG system has been considered as a critical component for numerous applications, which include but are not limited to information-seeking systems, multi-modal conversations, and intelligent tutoring and computer-assisted learning systems. Among these, generating questions from natural language texts has great practical significance. In education, forming good questions is crucial for improving academic performance and evaluating students' abilities. Meanwhile, texts with potential educational value, such as new articles and Wikipedia, are good reading resources for learning purposes. QG can generate questions for reading comprehension and serve as a component in intelligent tutoring systems. Benefiting from QG, the manpower and cost of creating large-scale data sets for question answering (QA) and reading comprehension can be reduced. QG is an essential feature for the chatbot in the dialog system when initiating a conversation or seeking specific information from users. Therefore, developing an automatic QG system is important and timely. The thesis focuses on QG from texts automatically. The goal is to develop models which input from the text and the output is generated questions for downstream applications. Inspired by advances in deep learning, several neural QG models are proposed and studied, addressing different challenges in QG in this thesis. One of the main challenges faced by existing neural QG models is the degradation in performance due to the issue of one-to-many mapping in the nature of QG, where, given a passage, a number of questions with variations can be generated. First, the answer as direction information (DI) is included in the RNN-based sequence-to-sequence (seq2seq) model, so as to allow the neural model to consider what to ask, which makes the model generate pertinent questions. A dual-encoder model is developed to learn the source input text and the DI representation, respectively. Therefore, an answer-aware QG model is proposed to alleviate the problem of one-to-many mapping. The nature of one-to-many mapping in QG is mainly due to two aspects of QG–what to ask and how to ask, in which the former is associated with generating to-the-point questions while the latter is with the content selection from the source to form specific questions. To further cope with the one-to-many mapping challenge, a controllable QG (CQG) model that employs an attentive seq2seq based generative model with a copying mechanism is investigated. The proposed CQG incorporates query interest and auxiliary information as controllers to address the one-to-many mapping problem in QG. Two variants of embedding strategies are designed for CQG to obtain good performance. Furthermore, multi-hop QG, which requires more complex reasoning of the input texts over multiple pieces of information, is studied. To capture the global context and facilitate reasoning, a novel framework that includes the semantic graph of the input document, where the graph can be regarded as auxiliary information, is proposed. The model encodes the semantic graph for the input texts by graph learning. Thereafter, text-level and graph-level representations are fused to generate questions via a pre-trained language model. Doctor of Philosophy 2023-02-07T07:21:52Z 2023-02-07T07:21:52Z 2022 Thesis-Doctor of Philosophy Cao, Z. (2022). Automatic question generation from natural language texts. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164625 https://hdl.handle.net/10356/164625 10.32657/10356/164625 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University