Online news analytics based on AI techniques

Text classification is an important technique in the field of Natural Language Processing (NLP). Using this technology, we can efficiently extract the types of text materials that we are interested in from massive texts, which can greatly improve the efficiency of our work and facilitate our live...

Full description

Saved in:
Bibliographic Details
Main Author: Wei, Zhifeng
Other Authors: Mao Kezhi
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/156153
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Text classification is an important technique in the field of Natural Language Processing (NLP). Using this technology, we can efficiently extract the types of text materials that we are interested in from massive texts, which can greatly improve the efficiency of our work and facilitate our lives. This Dissertation focus on the news classification task of natural disasters. First of all, a news data set with nearly 2000 articles is collected. Then, different Text Representation methods such as Bag of words (BOW), term frequency–inverse document frequency (TF-IDF) and Latent Dirichlet Allocation (LDA) are tested using different classifiers and their classification performance are compared. After that, some deep learning neural network such as CNN, LSTM and Transformer are used to perform classification tasks on the data set collected before and the classification performance of these models are compared. At the same time, the performance of the randomly initialized Word Embeddings, Word2vec, Glove, and Bert pre-trained models on this data set are analyzed and compared. This Dissertation uses python3 and Pytorch deep learning framework for experimental demonstration. Accuracy, precision, recall and f1 score are used as evaluation criteria. The demonstration results show that the Transformer model and the Bert pre-trained model are slightly better than other models for classification tasks on the dataset collected in this dissertation.