Multi-level classification of long text based on convolutional neural network

With the rise of mobile Internet, new media has seen unprecedented development, and news existing in the network has also shown a lot of growth. How to quickly extract the required classification information from mass text data for decision-makers to analyze it has become the premise and an importan...

Full description

Saved in:
Bibliographic Details
Main Author: Xiao, Siwei
Other Authors: Mao Kezhi
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/152902
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:With the rise of mobile Internet, new media has seen unprecedented development, and news existing in the network has also shown a lot of growth. How to quickly extract the required classification information from mass text data for decision-makers to analyze it has become the premise and an important part of the further use of news text information. For example, a maritime safety agency may need to make predictions based on recent piracy-related topics, such as unemployment, oil prices, weather conditions, to determine whether security measures need to be strengthened. Given the above background, in this dissertation we proposed a system of mixed long and short hierarchical text combination classifier to accurately classify long text of various topics and backgrounds. Here the long text can be defined as articles exceed the limit of 512 words. Automated text classification has been considered as a vital method to manage and process a vast quantity of documents in digital forms that are widespread and continuously increasing. In general, text classification plays an important role in information extraction and summarization, text retrieval, and question-answering. This dissertation illustrates the text classification process using deep learning techniques. Firstly, crawler technology is used in the stage of text acquisition to acquire articles of related topics in batches. Secondly, CNN is used as a multi-level classifier, and RNN is used as the first-level classifier to be compared with CNN. Thirdly, according to the characteristics of text structure, text enhancement technology and data processing optimization are used to improve the accuracy of experiment. This method proposed by this dissertation achieves better results for multi-topic and multi-level text classification, and provides a reference method for the case of multiple and multi-classification of original web texts. The references cited cover the major theoretical issues and guide the researcher to interesting research directions.