Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques
One of the factors threatening the security of coastal countries is piracy. With the Cov-19 pandemic, piracy incidents have also become more frequent than usual, making it a challenge to the safety of residents and social stability. At the same time, published news reports on open resources for p...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/157271 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-157271 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1572712023-07-04T17:47:34Z Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques Ma, Shuting Mao Kezhi School of Electrical and Electronic Engineering EKZMao@ntu.edu.sg Engineering::Electrical and electronic engineering One of the factors threatening the security of coastal countries is piracy. With the Cov-19 pandemic, piracy incidents have also become more frequent than usual, making it a challenge to the safety of residents and social stability. At the same time, published news reports on open resources for piracy incidents are truly treasure for piracy research. With the maturity of artificial intelligence technology and the continuous development of Natural Language Processing, how to reasonably use these open resource text materials for analysis has become an important research direction. This project first introduces the possible applications of NLP to pirate news materials. The relevant piracy news materials were collected from the open resources, marked and cleaned to form a new dataset related to this topic. Four mainstream text classification models, textCNN, Bi-LSTM, Transformer, and Bert, theoretical introductions and practical tests are carried out, and Bert is finally selected as the base model. To address the imbalanced data classification problem, this project proposes and explores a variety of methods combined with deep learning and machine learning. On the one hand, data resampling has been achieved to improve the balance of the dataset. On the other hand, with Bert has been chosen to do classification, Costive-SVM is constructed in a fully connected layer with Triplet Loss to separate the labels of positive and negative samples. After fine-tuning, the performance of the model has been improved, where the over-fitting problem in the optimization process is solved as well. Finally, the F1 score improved from 0.46 to 0.87. Master of Science (Computer Control and Automation) 2022-05-11T13:48:19Z 2022-05-11T13:48:19Z 2022 Thesis-Master by Coursework Ma, S. (2022). Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/157271 https://hdl.handle.net/10356/157271 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering |
spellingShingle |
Engineering::Electrical and electronic engineering Ma, Shuting Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques |
description |
One of the factors threatening the security of coastal countries is piracy. With the Cov-19 pandemic, piracy incidents have also become more frequent than usual, making it a challenge to the safety of residents and social stability.
At the same time, published news reports on open resources for piracy incidents are truly treasure for piracy research. With the maturity of artificial intelligence technology and the continuous development of Natural Language Processing, how to reasonably use these open resource text materials for analysis has become an important research direction.
This project first introduces the possible applications of NLP to pirate news materials. The relevant piracy news materials were collected from the open resources, marked and cleaned to form a new dataset related to this topic. Four mainstream text classification models, textCNN, Bi-LSTM, Transformer, and Bert, theoretical introductions and practical tests are carried out, and Bert is finally selected as the base model.
To address the imbalanced data classification problem, this project proposes and explores a variety of methods combined with deep learning and machine learning. On the one hand, data resampling has been achieved to improve the balance of the dataset. On the other hand, with Bert has been chosen to do classification, Costive-SVM is constructed in a fully connected layer with Triplet Loss to separate the labels of positive and negative samples. After fine-tuning, the performance of the model has been improved, where the over-fitting problem in the optimization process is solved as well. Finally, the F1 score improved from 0.46 to 0.87. |
author2 |
Mao Kezhi |
author_facet |
Mao Kezhi Ma, Shuting |
format |
Thesis-Master by Coursework |
author |
Ma, Shuting |
author_sort |
Ma, Shuting |
title |
Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques |
title_short |
Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques |
title_full |
Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques |
title_fullStr |
Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques |
title_full_unstemmed |
Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques |
title_sort |
detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques |
publisher |
Nanyang Technological University |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/157271 |
_version_ |
1772828293543755776 |