Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques

One of the factors threatening the security of coastal countries is piracy. With the Cov-19 pandemic, piracy incidents have also become more frequent than usual, making it a challenge to the safety of residents and social stability. At the same time, published news reports on open resources for p...

Full description

Saved in:
Bibliographic Details
Main Author: Ma, Shuting
Other Authors: Mao Kezhi
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/157271
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-157271
record_format dspace
spelling sg-ntu-dr.10356-1572712023-07-04T17:47:34Z Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques Ma, Shuting Mao Kezhi School of Electrical and Electronic Engineering EKZMao@ntu.edu.sg Engineering::Electrical and electronic engineering One of the factors threatening the security of coastal countries is piracy. With the Cov-19 pandemic, piracy incidents have also become more frequent than usual, making it a challenge to the safety of residents and social stability. At the same time, published news reports on open resources for piracy incidents are truly treasure for piracy research. With the maturity of artificial intelligence technology and the continuous development of Natural Language Processing, how to reasonably use these open resource text materials for analysis has become an important research direction. This project first introduces the possible applications of NLP to pirate news materials. The relevant piracy news materials were collected from the open resources, marked and cleaned to form a new dataset related to this topic. Four mainstream text classification models, textCNN, Bi-LSTM, Transformer, and Bert, theoretical introductions and practical tests are carried out, and Bert is finally selected as the base model. To address the imbalanced data classification problem, this project proposes and explores a variety of methods combined with deep learning and machine learning. On the one hand, data resampling has been achieved to improve the balance of the dataset. On the other hand, with Bert has been chosen to do classification, Costive-SVM is constructed in a fully connected layer with Triplet Loss to separate the labels of positive and negative samples. After fine-tuning, the performance of the model has been improved, where the over-fitting problem in the optimization process is solved as well. Finally, the F1 score improved from 0.46 to 0.87. Master of Science (Computer Control and Automation) 2022-05-11T13:48:19Z 2022-05-11T13:48:19Z 2022 Thesis-Master by Coursework Ma, S. (2022). Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/157271 https://hdl.handle.net/10356/157271 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Ma, Shuting
Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques
description One of the factors threatening the security of coastal countries is piracy. With the Cov-19 pandemic, piracy incidents have also become more frequent than usual, making it a challenge to the safety of residents and social stability. At the same time, published news reports on open resources for piracy incidents are truly treasure for piracy research. With the maturity of artificial intelligence technology and the continuous development of Natural Language Processing, how to reasonably use these open resource text materials for analysis has become an important research direction. This project first introduces the possible applications of NLP to pirate news materials. The relevant piracy news materials were collected from the open resources, marked and cleaned to form a new dataset related to this topic. Four mainstream text classification models, textCNN, Bi-LSTM, Transformer, and Bert, theoretical introductions and practical tests are carried out, and Bert is finally selected as the base model. To address the imbalanced data classification problem, this project proposes and explores a variety of methods combined with deep learning and machine learning. On the one hand, data resampling has been achieved to improve the balance of the dataset. On the other hand, with Bert has been chosen to do classification, Costive-SVM is constructed in a fully connected layer with Triplet Loss to separate the labels of positive and negative samples. After fine-tuning, the performance of the model has been improved, where the over-fitting problem in the optimization process is solved as well. Finally, the F1 score improved from 0.46 to 0.87.
author2 Mao Kezhi
author_facet Mao Kezhi
Ma, Shuting
format Thesis-Master by Coursework
author Ma, Shuting
author_sort Ma, Shuting
title Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques
title_short Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques
title_full Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques
title_fullStr Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques
title_full_unstemmed Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques
title_sort detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/157271
_version_ 1772828293543755776