Neural architectures for natural language understanding
Empowering machines with the ability to read and reason live at the heart of Artificial Intelligence (AI) research. Language is ubiquitous, serving as a key communication mechanism that is woven tightly into the fabric of society and humanity. The pervasiveness of textual content is made evident by...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/83291 http://hdl.handle.net/10220/50090 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-83291 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Tay, Yi Neural architectures for natural language understanding |
description |
Empowering machines with the ability to read and reason live at the heart of Artificial Intelligence (AI) research. Language is ubiquitous, serving as a key communication mechanism that is woven tightly into the fabric of society and humanity. The pervasiveness of textual content is made evident by the billions of documents, social posts, and messages on the web. As such, the ability to make sense, reason and understand textual content has immense potential to benefit a large range of real-world applications such as search, question answering, recommender systems, and/or personal chat assistants.
This thesis tackles the problem of natural language understanding (NLU) and in particular, problem domains that fall under the umbrella of NLU, e.g., question answering, machine reading comprehension, natural language inference, retrieval-based NLU, etc. More specifically, we study machine learning models (in particular, neural architectures), for solving a suite of NLU problems. The key goal is to enable machines to be able to read and comprehend natural language.
We make several novel contributions in this thesis, mainly revolving around the design of neural architectures for NLU problems. The key contributions are listed as follows:
1) We propose two new state-of-the-art neural models for natural language inference: ComProp Alignment-Factorized Encoders (CAFE) and Co-Stack Residual Affinity Networks (CSRAN). On the single model setting, CAFE and CSRAN achieve 88.5% accuracy and 88.7% accuracy respectively on the well-studied SNLI benchmark.
2) We propose Multi-Cast Attention Networks (MCAN) for retrieval-based NLU. On Ubuntu dialogue corpus, MCAN outperforms the existing state-of-the-art models by 9%. MCAN also achieves the best-performing score of 0.838 MAP and 0.904 MRR on the well-studied TrecQA dataset.
3) We propose Densely Connected Attention Propagation (DecaProp), a new model designed for machine reading comprehension (MRC) on the web. We achieve state-of-the-art performance on reading tests on news and Wikipedia articles. DecaProp achieves 2.6%-14.2% absolute improvement in F1 score over the existing state-of-the-art on four challenging MRC datasets.
4) We propose Introspective Alignment Reader and Curriculum Pointer-Generator (IAL-CPG) model for reading and understanding long narratives. IAL-CPG achieves state-of-the-art performance on the NarrativeQA reading comprehension challenge. On metrics such as BLEU-4 and Rouge-L, we achieve a 17% relative improvement over prior state-of-the-art and a 10 times improvement in terms of BLEU-4 score over BiDAF, a strong span prediction based model.
5) We propose Multi-Pointer Co-Attention Networks (MPCN) for recommendations with reviews. On Amazon Reviews dataset, MPCN improves the existing state-of-the-art DeepCoNN and D-ATT model by up to 71% and 5% respectively in terms of relative improvement.
6) Moreover, we propose two novel general-purpose encoding units for sequence encoding for natural language understanding: Dilated Compositional Units (DCU) and Recurrently Controlled Recurrent Networks (RCRN). DCU achieves state-of-the-art on the RACE dataset, demonstrating improvement over LSTM/GRU encoders by $6\%$. On the other hand, RCRN outperforms stacked BiLSTMs and BiLSTMs across 26 NLP/NLU datasets.
7) Finally, we propose two novel techniques for efficient training and inference of NLU models: HyperQA (Hyperbolic NLU) and Quaternion Attention/Quaternion Transformer Models. HyperQA outperforms strong attention and recurrent baselines while being extremely lightweight (40K to 90K parameters). On the other hand, Quaternion Attention/Quaternion Transformers enables up to 75% parameter reduction while maintaining competitive performance. |
author2 |
Hui Siu Cheung |
author_facet |
Hui Siu Cheung Tay, Yi |
format |
Theses and Dissertations |
author |
Tay, Yi |
author_sort |
Tay, Yi |
title |
Neural architectures for natural language understanding |
title_short |
Neural architectures for natural language understanding |
title_full |
Neural architectures for natural language understanding |
title_fullStr |
Neural architectures for natural language understanding |
title_full_unstemmed |
Neural architectures for natural language understanding |
title_sort |
neural architectures for natural language understanding |
publishDate |
2019 |
url |
https://hdl.handle.net/10356/83291 http://hdl.handle.net/10220/50090 |
_version_ |
1683493778693816320 |
spelling |
sg-ntu-dr.10356-832912020-10-28T08:40:49Z Neural architectures for natural language understanding Tay, Yi Hui Siu Cheung School of Computer Science and Engineering Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Empowering machines with the ability to read and reason live at the heart of Artificial Intelligence (AI) research. Language is ubiquitous, serving as a key communication mechanism that is woven tightly into the fabric of society and humanity. The pervasiveness of textual content is made evident by the billions of documents, social posts, and messages on the web. As such, the ability to make sense, reason and understand textual content has immense potential to benefit a large range of real-world applications such as search, question answering, recommender systems, and/or personal chat assistants. This thesis tackles the problem of natural language understanding (NLU) and in particular, problem domains that fall under the umbrella of NLU, e.g., question answering, machine reading comprehension, natural language inference, retrieval-based NLU, etc. More specifically, we study machine learning models (in particular, neural architectures), for solving a suite of NLU problems. The key goal is to enable machines to be able to read and comprehend natural language. We make several novel contributions in this thesis, mainly revolving around the design of neural architectures for NLU problems. The key contributions are listed as follows: 1) We propose two new state-of-the-art neural models for natural language inference: ComProp Alignment-Factorized Encoders (CAFE) and Co-Stack Residual Affinity Networks (CSRAN). On the single model setting, CAFE and CSRAN achieve 88.5% accuracy and 88.7% accuracy respectively on the well-studied SNLI benchmark. 2) We propose Multi-Cast Attention Networks (MCAN) for retrieval-based NLU. On Ubuntu dialogue corpus, MCAN outperforms the existing state-of-the-art models by 9%. MCAN also achieves the best-performing score of 0.838 MAP and 0.904 MRR on the well-studied TrecQA dataset. 3) We propose Densely Connected Attention Propagation (DecaProp), a new model designed for machine reading comprehension (MRC) on the web. We achieve state-of-the-art performance on reading tests on news and Wikipedia articles. DecaProp achieves 2.6%-14.2% absolute improvement in F1 score over the existing state-of-the-art on four challenging MRC datasets. 4) We propose Introspective Alignment Reader and Curriculum Pointer-Generator (IAL-CPG) model for reading and understanding long narratives. IAL-CPG achieves state-of-the-art performance on the NarrativeQA reading comprehension challenge. On metrics such as BLEU-4 and Rouge-L, we achieve a 17% relative improvement over prior state-of-the-art and a 10 times improvement in terms of BLEU-4 score over BiDAF, a strong span prediction based model. 5) We propose Multi-Pointer Co-Attention Networks (MPCN) for recommendations with reviews. On Amazon Reviews dataset, MPCN improves the existing state-of-the-art DeepCoNN and D-ATT model by up to 71% and 5% respectively in terms of relative improvement. 6) Moreover, we propose two novel general-purpose encoding units for sequence encoding for natural language understanding: Dilated Compositional Units (DCU) and Recurrently Controlled Recurrent Networks (RCRN). DCU achieves state-of-the-art on the RACE dataset, demonstrating improvement over LSTM/GRU encoders by $6\%$. On the other hand, RCRN outperforms stacked BiLSTMs and BiLSTMs across 26 NLP/NLU datasets. 7) Finally, we propose two novel techniques for efficient training and inference of NLU models: HyperQA (Hyperbolic NLU) and Quaternion Attention/Quaternion Transformer Models. HyperQA outperforms strong attention and recurrent baselines while being extremely lightweight (40K to 90K parameters). On the other hand, Quaternion Attention/Quaternion Transformers enables up to 75% parameter reduction while maintaining competitive performance. Doctor of Philosophy 2019-10-07T01:40:39Z 2019-12-06T15:19:19Z 2019-10-07T01:40:39Z 2019-12-06T15:19:19Z 2019 Thesis Tay, Y. (2019). Neural architectures for natural language understanding. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/83291 http://hdl.handle.net/10220/50090 10.32657/10356/83291 en 250 p. application/pdf |