Deep learning-based automatic document categorization and organization
Given the vast improvement in information technology today, document classification has become a major research area of Natural Language Processing. Previously, document classification was done by using Traditional Machine Learning algorithm to categorize online documents. However, Traditional Machi...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/149304 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Given the vast improvement in information technology today, document classification has become a major research area of Natural Language Processing. Previously, document classification was done by using Traditional Machine Learning algorithm to categorize online documents. However, Traditional Machine Learning algorithms have shown to be unable to cope with the massive amount of online information generated daily. On the other hand, Deep Learning algorithms’ performance increases with data. Therefore, we introduce Deep Learning models to perform the document classification task, using the large amount of information data being generated daily.
This project aims to build an AI system that performs document classification by using Deep Learning-based methods. In my work, 5 Deep Learning-based models are compared and evaluated. The coarse-grained classification task involves the Deep Learning-based models classifying news articles into 5 entry-level categories: Economy, Fuel Price, Illegal Fishing, Weather and Climate, and Others. A fine-grained classification task was also conducted in this project using news articles in Fuel Price category to further classify them into two subcategories: Price Increase and Price Decrease. It was identified that the model that uses TF-IDF word representation and Feedforward Artificial Neural Network outperformed all the other models with classification accuracy of 98% and 88.25% for coarse-grained and fine-grained classification task, respectively.
News classification allows us to detect the occurrence of certain events. In particular, the abovementioned news classification done in this project contributes to detecting piracy in the Straits of Malacca. The project has successfully evaluated the Deep Learning-based model best use for document classification of news articles and can be utilized to analyze the trend of piracy occurring in Straits of Malacca. |
---|