Online news analytics based on AI techniques
Text classification is an important technique in the field of Natural Language Processing (NLP). Using this technology, we can efficiently extract the types of text materials that we are interested in from massive texts, which can greatly improve the efficiency of our work and facilitate our live...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/156153 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Text classification is an important technique in the field of Natural Language
Processing (NLP). Using this technology, we can efficiently extract the types of
text materials that we are interested in from massive texts, which can greatly
improve the efficiency of our work and facilitate our lives.
This Dissertation focus on the news classification task of natural disasters. First
of all, a news data set with nearly 2000 articles is collected. Then, different
Text Representation methods such as Bag of words (BOW), term frequency–inverse
document frequency (TF-IDF) and Latent Dirichlet Allocation (LDA) are tested
using different classifiers and their classification performance are compared. After
that, some deep learning neural network such as CNN, LSTM and Transformer
are used to perform classification tasks on the data set collected before
and the classification performance of these models are compared. At the same
time, the performance of the randomly initialized Word Embeddings, Word2vec,
Glove, and Bert pre-trained models on this data set are analyzed and compared.
This Dissertation uses python3 and Pytorch deep learning framework for experimental
demonstration. Accuracy, precision, recall and f1 score are used as evaluation
criteria. The demonstration results show that the Transformer model and
the Bert pre-trained model are slightly better than other models for classification
tasks on the dataset collected in this dissertation. |
---|