Deep learning-based automatic document categorization and organization

Given the vast improvement in information technology today, document classification has become a major research area of Natural Language Processing. Previously, document classification was done by using Traditional Machine Learning algorithm to categorize online documents. However, Traditional Machi...

Full description

Saved in:
Bibliographic Details
Main Author: Foo, Shawn Nicholas Say Yan
Other Authors: Mao Kezhi
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/149304
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-149304
record_format dspace
spelling sg-ntu-dr.10356-1493042023-07-07T18:27:37Z Deep learning-based automatic document categorization and organization Foo, Shawn Nicholas Say Yan Mao Kezhi School of Electrical and Electronic Engineering EKZMao@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Document and text processing Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Given the vast improvement in information technology today, document classification has become a major research area of Natural Language Processing. Previously, document classification was done by using Traditional Machine Learning algorithm to categorize online documents. However, Traditional Machine Learning algorithms have shown to be unable to cope with the massive amount of online information generated daily. On the other hand, Deep Learning algorithms’ performance increases with data. Therefore, we introduce Deep Learning models to perform the document classification task, using the large amount of information data being generated daily. This project aims to build an AI system that performs document classification by using Deep Learning-based methods. In my work, 5 Deep Learning-based models are compared and evaluated. The coarse-grained classification task involves the Deep Learning-based models classifying news articles into 5 entry-level categories: Economy, Fuel Price, Illegal Fishing, Weather and Climate, and Others. A fine-grained classification task was also conducted in this project using news articles in Fuel Price category to further classify them into two subcategories: Price Increase and Price Decrease. It was identified that the model that uses TF-IDF word representation and Feedforward Artificial Neural Network outperformed all the other models with classification accuracy of 98% and 88.25% for coarse-grained and fine-grained classification task, respectively. News classification allows us to detect the occurrence of certain events. In particular, the abovementioned news classification done in this project contributes to detecting piracy in the Straits of Malacca. The project has successfully evaluated the Deep Learning-based model best use for document classification of news articles and can be utilized to analyze the trend of piracy occurring in Straits of Malacca. Bachelor of Engineering (Electrical and Electronic Engineering) 2021-05-29T13:13:36Z 2021-05-29T13:13:36Z 2021 Final Year Project (FYP) Foo, S. N. S. Y. (2021). Deep learning-based automatic document categorization and organization. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/149304 https://hdl.handle.net/10356/149304 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Foo, Shawn Nicholas Say Yan
Deep learning-based automatic document categorization and organization
description Given the vast improvement in information technology today, document classification has become a major research area of Natural Language Processing. Previously, document classification was done by using Traditional Machine Learning algorithm to categorize online documents. However, Traditional Machine Learning algorithms have shown to be unable to cope with the massive amount of online information generated daily. On the other hand, Deep Learning algorithms’ performance increases with data. Therefore, we introduce Deep Learning models to perform the document classification task, using the large amount of information data being generated daily. This project aims to build an AI system that performs document classification by using Deep Learning-based methods. In my work, 5 Deep Learning-based models are compared and evaluated. The coarse-grained classification task involves the Deep Learning-based models classifying news articles into 5 entry-level categories: Economy, Fuel Price, Illegal Fishing, Weather and Climate, and Others. A fine-grained classification task was also conducted in this project using news articles in Fuel Price category to further classify them into two subcategories: Price Increase and Price Decrease. It was identified that the model that uses TF-IDF word representation and Feedforward Artificial Neural Network outperformed all the other models with classification accuracy of 98% and 88.25% for coarse-grained and fine-grained classification task, respectively. News classification allows us to detect the occurrence of certain events. In particular, the abovementioned news classification done in this project contributes to detecting piracy in the Straits of Malacca. The project has successfully evaluated the Deep Learning-based model best use for document classification of news articles and can be utilized to analyze the trend of piracy occurring in Straits of Malacca.
author2 Mao Kezhi
author_facet Mao Kezhi
Foo, Shawn Nicholas Say Yan
format Final Year Project
author Foo, Shawn Nicholas Say Yan
author_sort Foo, Shawn Nicholas Say Yan
title Deep learning-based automatic document categorization and organization
title_short Deep learning-based automatic document categorization and organization
title_full Deep learning-based automatic document categorization and organization
title_fullStr Deep learning-based automatic document categorization and organization
title_full_unstemmed Deep learning-based automatic document categorization and organization
title_sort deep learning-based automatic document categorization and organization
publisher Nanyang Technological University
publishDate 2021
url https://hdl.handle.net/10356/149304
_version_ 1772827517178085376