Incorporating external knowledge into machine learning algorithms for NLP applications
Natural Language Processing (NLP) is a sub-field of Artificial Intelligence (AI) that mainly uses machine learning algorithms to process and analyze large amounts of text data. It gives machines the ability to read, understand, derive meaning from human languages, and potentially generate human lang...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/144577 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Natural Language Processing (NLP) is a sub-field of Artificial Intelligence (AI) that mainly uses machine learning algorithms to process and analyze large amounts of text data. It gives machines the ability to read, understand, derive meaning from human languages, and potentially generate human languages. The key issue in the modern statistical NLP is text representation learning that transforms unstructured text data into structured numerical representations. A good text representation shall capture important lexical, syntactic, and semantic information for certain NLP tasks, such as keywords and cue phrases, conceptual information, and long-distance dependencies.
Traditional Bag-of-Words (BoW) model represents text as a fixed-length vector of words, where each dimension is a numerical value such as frequency or tf-idf weight. However, BoW simply looks at the surface form of words and suffers from high dimensionality and sparsity issues. Deep neural networks have shown to be more effective since word order information can be utilized and more semantic features can be captured. The commonly adopted deep neural network architectures include Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Transformer. However, deep neural network normally requires large amount of training data, heavy computations, and sufficient CPU/GPU memory. The lack of high-quality training data can easily lead to under-fitting or over-fitting issue, especially for data-driven deep neural networks. Besides, hardware constraints as well as poor interpretability often become obstacles for deep neural networks to be applied in real-world NLP applications.
External knowledge is proven to be beneficial for machine learning algorithms to reduce the reliance on training data and provide additional useful information. For natural language, abundant publicly available knowledge bases such as WordNet, FrameNet, Wikipedia, etc. can be utilized for various NLP tasks. However, different tasks require different knowledge, and different machine learning models have different architectures and operations. How to effectively incorporate useful external knowledge into machine learning algorithms remains an open research question.
This thesis focuses on incorporating existing knowledge from external knowledge bases into machine learning algorithms as prior knowledge for NLP applications. By utilizing external knowledge, we aim at obtaining better text representations, reducing the model's reliance on training data, and improving model interpretability. We demonstrate the advantages of leveraging both data and knowledge in machine learning systems and provide general frameworks of incorporating external knowledge into different machine learning algorithms to improve their performance on various NLP tasks. Specifically,
1. For BoW model, we show how to utilize the conceptual knowledge from a probabilistic knowledge base (Probase) and construct a Bag-of-Concepts (BoC) representation to provide more semantic and conceptual information of text and better interpretability for document classification.
2. For CNN, we demonstrate how to automatically generate convolutional filters from lexical knowledge bases such as WordNet and FrameNet to improve its ability to capture the keywords and cue phrases for causal relation extraction.
3. For Transformer, we propose a complementary knowledge-attention encoder which incorporates prior knowledge from lexical knowledge bases to better capture the important linguistic clues. We also propose three effective ways of integrating knowledge-attention with the self-attention in Transformer to maximize the utilization of both knowledge and data for relation extraction.
4. For neural networks using attention mechanism, we show how to incorporate word's sentiment intensity information from SentiWordNet into attention mechanism for sentiment analysis task. In addition, we propose two novel neural architectures including Convolutional Transformer (ConvTransformer) and Attentive Convolutional Transformer (ACT) which take the advantages of both CNN and Transformer for efficient text representation. |
---|