Label semantics embedding and hierarchical attentions for text representation learning

Text classification is one of the most widely-used and important NLP (Natural Language Processing) tasks that aim to deduce the most proper pre-defined label for a given document or sentence, such as spam detection, topic classification, sentiment analysis, and so forth. One of the key steps of text...

Full description

Saved in:
Bibliographic Details
Main Author: Min, Fuzhou
Other Authors: Mao Kezhi
Format: Thesis-Master by Research
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165286
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-165286
record_format dspace
spelling sg-ntu-dr.10356-1652862023-07-04T16:13:37Z Label semantics embedding and hierarchical attentions for text representation learning Min, Fuzhou Mao Kezhi School of Electrical and Electronic Engineering EKZMao@ntu.edu.sg Engineering::Electrical and electronic engineering Text classification is one of the most widely-used and important NLP (Natural Language Processing) tasks that aim to deduce the most proper pre-defined label for a given document or sentence, such as spam detection, topic classification, sentiment analysis, and so forth. One of the key steps of text classification is text representation. With the rapid development of machine learning, neural network models such as Convolutional Neural Networks and Recurrent Neural Networks have been commonly employed for achieving text representation learning. Currently, in most existing text classification models, the labels of the classification task used by models are always represented as one-hot vectors, without the dependence on the semantics of text data itself. For example, in a sentiment analysis task, the labels “positive”and “negative ”are encoded as [1,0] and [0,1], and thesemantic information of the labels is not made full use of. However, the semantics oflabels are highly related to the text classification task. Therefore, the information con tained in labels can not be disregarded. In this thesis work, we propose a Label Embedding-based Hierarchical Attention Model (LE-HAM) incorporating the semantic information of labels. We implement the semantic information of labels by jointly embedding the labels and words. Further, to solve the other problem, the structure of a single attention mechanism does not achieve satisfac tory results for data with weak signals. We introduce a model that includes a two-level attention framework based on the label semantics embedding. This hierarchical attention structure aims at the text data with weak signals in the tasks, seeks to exploit the label information to choose the key sentences first, then use only these selected sentences combined with label information to build a text representation. Therefore, the major ity of noises can be removed. The main novelty is that this method creatively uses the sentence-chosen mechanism. In this way, the model can find the key sentences when there are many noises in the text, then keywords can also be located more efficiently, and the accuracy of the text classification task on those “weak signal”datasets can be improved. Master of Engineering 2023-03-23T01:33:20Z 2023-03-23T01:33:20Z 2023 Thesis-Master by Research Min, F. (2023). Label semantics embedding and hierarchical attentions for text representation learning. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165286 https://hdl.handle.net/10356/165286 10.32657/10356/165286 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Min, Fuzhou
Label semantics embedding and hierarchical attentions for text representation learning
description Text classification is one of the most widely-used and important NLP (Natural Language Processing) tasks that aim to deduce the most proper pre-defined label for a given document or sentence, such as spam detection, topic classification, sentiment analysis, and so forth. One of the key steps of text classification is text representation. With the rapid development of machine learning, neural network models such as Convolutional Neural Networks and Recurrent Neural Networks have been commonly employed for achieving text representation learning. Currently, in most existing text classification models, the labels of the classification task used by models are always represented as one-hot vectors, without the dependence on the semantics of text data itself. For example, in a sentiment analysis task, the labels “positive”and “negative ”are encoded as [1,0] and [0,1], and thesemantic information of the labels is not made full use of. However, the semantics oflabels are highly related to the text classification task. Therefore, the information con tained in labels can not be disregarded. In this thesis work, we propose a Label Embedding-based Hierarchical Attention Model (LE-HAM) incorporating the semantic information of labels. We implement the semantic information of labels by jointly embedding the labels and words. Further, to solve the other problem, the structure of a single attention mechanism does not achieve satisfac tory results for data with weak signals. We introduce a model that includes a two-level attention framework based on the label semantics embedding. This hierarchical attention structure aims at the text data with weak signals in the tasks, seeks to exploit the label information to choose the key sentences first, then use only these selected sentences combined with label information to build a text representation. Therefore, the major ity of noises can be removed. The main novelty is that this method creatively uses the sentence-chosen mechanism. In this way, the model can find the key sentences when there are many noises in the text, then keywords can also be located more efficiently, and the accuracy of the text classification task on those “weak signal”datasets can be improved.
author2 Mao Kezhi
author_facet Mao Kezhi
Min, Fuzhou
format Thesis-Master by Research
author Min, Fuzhou
author_sort Min, Fuzhou
title Label semantics embedding and hierarchical attentions for text representation learning
title_short Label semantics embedding and hierarchical attentions for text representation learning
title_full Label semantics embedding and hierarchical attentions for text representation learning
title_fullStr Label semantics embedding and hierarchical attentions for text representation learning
title_full_unstemmed Label semantics embedding and hierarchical attentions for text representation learning
title_sort label semantics embedding and hierarchical attentions for text representation learning
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/165286
_version_ 1772828235871027200