Graph of words for document classification

This FYP project is about the implementations and experimental studies of a novel framework for large data classifications of textual documents. Under this new framework, documents are first transferred from sentences into graph-of-words, so the original classification problem is then considered as...

Full description

Saved in:
Bibliographic Details
Main Author: Quach, Tri Dung
Other Authors: Chen Lihui
Format: Final Year Project
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/75040
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-75040
record_format dspace
spelling sg-ntu-dr.10356-750402023-07-07T15:57:52Z Graph of words for document classification Quach, Tri Dung Chen Lihui School of Electrical and Electronic Engineering DRNTU::Engineering DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems This FYP project is about the implementations and experimental studies of a novel framework for large data classifications of textual documents. Under this new framework, documents are first transferred from sentences into graph-of-words, so the original classification problem is then considered as graph classification and advanced representation learning (RL) model subgraph2vec can be applied. However, as shared by many other RL based methods, poor efficiency problem is serious because in general NLP dataset has a huge vocabulary. Thus, this project proposes hash embeddings version of subgraph2vec to significantly reduce required memory for training phase, make system become efficient without harming the quality of resultant representations. The approach is evaluated in terms of time, required memory, accuracy and f1 score with benchmark datasets on 3 domains (the first 2 are graph classification task and the last task is document classification). Through experiments, proposed approach outperforms other RL based methods and achieves comparable results with state-of-the-art method. Finally, the FYP project introduces semi supervised version of the method and observes the significant increases in sentimental analysis task. Bachelor of Engineering 2018-05-28T01:35:08Z 2018-05-28T01:35:08Z 2018 Final Year Project (FYP) http://hdl.handle.net/10356/75040 en Nanyang Technological University 55 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering
DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle DRNTU::Engineering
DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Quach, Tri Dung
Graph of words for document classification
description This FYP project is about the implementations and experimental studies of a novel framework for large data classifications of textual documents. Under this new framework, documents are first transferred from sentences into graph-of-words, so the original classification problem is then considered as graph classification and advanced representation learning (RL) model subgraph2vec can be applied. However, as shared by many other RL based methods, poor efficiency problem is serious because in general NLP dataset has a huge vocabulary. Thus, this project proposes hash embeddings version of subgraph2vec to significantly reduce required memory for training phase, make system become efficient without harming the quality of resultant representations. The approach is evaluated in terms of time, required memory, accuracy and f1 score with benchmark datasets on 3 domains (the first 2 are graph classification task and the last task is document classification). Through experiments, proposed approach outperforms other RL based methods and achieves comparable results with state-of-the-art method. Finally, the FYP project introduces semi supervised version of the method and observes the significant increases in sentimental analysis task.
author2 Chen Lihui
author_facet Chen Lihui
Quach, Tri Dung
format Final Year Project
author Quach, Tri Dung
author_sort Quach, Tri Dung
title Graph of words for document classification
title_short Graph of words for document classification
title_full Graph of words for document classification
title_fullStr Graph of words for document classification
title_full_unstemmed Graph of words for document classification
title_sort graph of words for document classification
publishDate 2018
url http://hdl.handle.net/10356/75040
_version_ 1772825466065911808