Question classification via machine learning techniques

Questions are indispensable tools in our daily communication and for the process of acquiring information and knowledge. Recent developments in technology and the internet has also brought about many social sites where community members engage in knowledge-building discussions. These technologies ha...

Full description

Saved in:
Bibliographic Details
Main Author: Ho, Mun Kit
Other Authors: Andy Khong W H
Format: Thesis-Master by Research
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/145449
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Questions are indispensable tools in our daily communication and for the process of acquiring information and knowledge. Recent developments in technology and the internet has also brought about many social sites where community members engage in knowledge-building discussions. These technologies have also been translated to online-learning platforms, and increasingly, these have become scalable tools where students across the globe interact and learn. Understanding the cognitive complexities and quality of questions in such learning settings provide additional insights for educators to monitor achievement of learning outcomes and administer intervention when required. This thesis therefore aims to propose automated solutions using machine learning methods to address this pedagogical need. Questions in online-learning platforms are commonly found in assessments authored by instructors to assess learners' understanding on the subject. As online-learning platform scales up, it becomes increasingly laborious to manually create assessments comprising questions of various difficulties for students. However, existing question classification models are limited in terms of modeling semantics. Labeling assessment questions by cognitive complexity not only involves the detection of keywords that discriminate between complexities, but also requires consideration of contextual semantic features. A neural network-based machine-learning model is proposed with attention mechanism to direct the creation of a question representation for this purpose. Experiments on university-level digital signal processing questions demonstrate improved performance against other keyword feature machine learning models when detecting patterns resembling Bloom's taxonomy learning outcome templates. In addition, the proposed classifier is integrated into a web-based quiz generation system to support retrieval practice among students with a desired mixture of questions at different complexity levels. User-generated questions have, on the other hand, become increasingly popular on social media sites for inquiring about specific knowledge outside academic settings. These questions, as opposed to assessment questions, are authored casually, which are error-prone and usually not as sophisticated. To overcome problems of noise such as misspellings, it is important to progressively interpret the question by filtering out the noise and pick out only the salient features. This is achieved via a hierarchical architecture with a new topic-weighted attention mechanism that provides context-aware attention on the question. Furthermore, the proposed approach performs well in the chosen evaluation metrics against other baseline models without assistance from community features. The efficacy of this approach is verified on the Stack Overflow questions dataset. This approach is found to be effective at finding contextual information in the sub-divided texts to form an effective overall representation. Studies on human-authored texts have found that specific information included in a piece of text improve comprehension. In education and on websites, this helps to increase the overall quality of information being communicated. In the previous model, the attention scheme was data-driven and may not make use of granular entities for extracting features. Using entity embeddings from a named-entity recognizer, the markers give hints to the attention to focus the feature extraction around the entities, thus enhancing performance in its discrimination of very good vs bad questions. Results on the Stack Overflow question dataset indicate that the tag embeddings enhanced its performance over the predecessor, especially with finer categories of tags used, instead of binary indicators. The entity tags were shown to work well with the proposed topic-weighted attention mechanism, thus creating a structural bias to focus on specificity-related features at these crucial locations.