Automatic document categorization

With the increasing popularity of social media network in the recent years, the concerns have been raised for the exposure of cyber bullying. The harmful information brings huge negative impact on the mental health of people who are exposed to them, especially teenagers. Therefore, it is essentia...

Full description

Saved in:
Bibliographic Details
Main Author: Zhou, Anna
Other Authors: Mao Kezhi
Format: Final Year Project
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/67886
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-67886
record_format dspace
spelling sg-ntu-dr.10356-678862023-07-07T16:15:39Z Automatic document categorization Zhou, Anna Mao Kezhi School of Electrical and Electronic Engineering DRNTU::Engineering With the increasing popularity of social media network in the recent years, the concerns have been raised for the exposure of cyber bullying. The harmful information brings huge negative impact on the mental health of people who are exposed to them, especially teenagers. Therefore, it is essential to find an effective way of cyber bullying detection. In this paper, we proposed two different models for the text representation and feature extraction. Introduction to the topic and some related work were presented firstly for a better understanding of the topic. Then the concept of the two text representation models Embedding Enhanced Bag-of-Words model and Bullying-Word-Filter model were introduced. In the experiment part, we applied these two models with some manually labeled tweets and did the testing. The performances of prediction scores were illustrated. In the second part, with the classifiers trained in the first part, a case study concentrating on the cyber bullying cases in Singapore was done. It wasshown in the paper that our proposed models outperformed many existing models and worked efficiently in cyber bullying detection. In the future, more works are supposed to be finished. Bachelor of Engineering 2016-05-23T06:19:58Z 2016-05-23T06:19:58Z 2016 Final Year Project (FYP) http://hdl.handle.net/10356/67886 en Nanyang Technological University 68 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering
spellingShingle DRNTU::Engineering
Zhou, Anna
Automatic document categorization
description With the increasing popularity of social media network in the recent years, the concerns have been raised for the exposure of cyber bullying. The harmful information brings huge negative impact on the mental health of people who are exposed to them, especially teenagers. Therefore, it is essential to find an effective way of cyber bullying detection. In this paper, we proposed two different models for the text representation and feature extraction. Introduction to the topic and some related work were presented firstly for a better understanding of the topic. Then the concept of the two text representation models Embedding Enhanced Bag-of-Words model and Bullying-Word-Filter model were introduced. In the experiment part, we applied these two models with some manually labeled tweets and did the testing. The performances of prediction scores were illustrated. In the second part, with the classifiers trained in the first part, a case study concentrating on the cyber bullying cases in Singapore was done. It wasshown in the paper that our proposed models outperformed many existing models and worked efficiently in cyber bullying detection. In the future, more works are supposed to be finished.
author2 Mao Kezhi
author_facet Mao Kezhi
Zhou, Anna
format Final Year Project
author Zhou, Anna
author_sort Zhou, Anna
title Automatic document categorization
title_short Automatic document categorization
title_full Automatic document categorization
title_fullStr Automatic document categorization
title_full_unstemmed Automatic document categorization
title_sort automatic document categorization
publishDate 2016
url http://hdl.handle.net/10356/67886
_version_ 1772827800341839872