Automatic question generation from natural language texts
Question generation (QG) is defined as the task of generating questions automatically from a variety of inputs, ranging from database information to raw text. QG system has been considered as a critical component for numerous applications, which include but are not limited to information-seeking sys...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/164625 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Question generation (QG) is defined as the task of generating questions automatically from a variety of inputs, ranging from database information to raw text. QG system has been considered as a critical component for numerous applications, which include but are not limited to information-seeking systems, multi-modal conversations, and intelligent tutoring and computer-assisted learning systems. Among these, generating questions from natural language texts has great practical significance. In education, forming good questions is crucial for improving academic performance and evaluating students' abilities. Meanwhile, texts with potential educational value, such as new articles and Wikipedia, are good reading resources for learning purposes. QG can generate questions for reading comprehension and serve as a component in intelligent tutoring systems. Benefiting from QG, the manpower and cost of creating large-scale data sets for question answering (QA) and reading comprehension can be reduced. QG is an essential feature for the chatbot in the dialog system when initiating a conversation or seeking specific information from users. Therefore, developing an automatic QG system is important and timely. The thesis focuses on QG from texts automatically. The goal is to develop models which input from the text and the output is generated questions for downstream applications. Inspired by advances in deep learning, several neural QG models are proposed and studied, addressing different challenges in QG in this thesis.
One of the main challenges faced by existing neural QG models is the degradation in performance due to the issue of one-to-many mapping in the nature of QG, where, given a passage, a number of questions with variations can be generated. First, the answer as direction information (DI) is included in the RNN-based sequence-to-sequence (seq2seq) model, so as to allow the neural model to consider what to ask, which makes the model generate pertinent questions. A dual-encoder model is developed to learn the source input text and the DI representation, respectively. Therefore, an answer-aware QG model is proposed to alleviate the problem of one-to-many mapping.
The nature of one-to-many mapping in QG is mainly due to two aspects of QG–what to ask and how to ask, in which the former is associated with generating to-the-point questions while the latter is with the content selection from the source to form specific questions. To further cope with the one-to-many mapping challenge, a controllable QG (CQG) model that employs an attentive seq2seq based generative model with a copying mechanism is investigated. The proposed CQG incorporates query interest and auxiliary information as controllers to address the one-to-many mapping problem in QG. Two variants of embedding strategies are designed for CQG to obtain good performance. Furthermore, multi-hop QG, which requires more complex reasoning of the input texts over multiple pieces of information, is studied. To capture the global context and facilitate reasoning, a novel framework that includes the semantic graph of the input document, where the graph can be regarded as auxiliary information, is proposed. The model encodes the semantic graph for the input texts by graph learning. Thereafter, text-level and graph-level representations are fused to generate questions via a pre-trained language model. |
---|