Document classification of Filipino online scam incident text using data mining techniques

The increasing number of online transactions and other internet activities give rise to the proliferation of online scam. The Philippine National Police Anti Cybercrime Group (PNP-ACG) reported an increasing number of complaints from a double digit figure in 2013 to a triple digit figure in 2017. Th...

Full description

Saved in:
Bibliographic Details
Main Author: Palad, Eddie Bouy B.
Format: text
Language:English
Published: Animo Repository 2018
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/5522
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etd_masteral-12360
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etd_masteral-123602021-01-25T06:50:15Z Document classification of Filipino online scam incident text using data mining techniques Palad, Eddie Bouy B. The increasing number of online transactions and other internet activities give rise to the proliferation of online scam. The Philippine National Police Anti Cybercrime Group (PNP-ACG) reported an increasing number of complaints from a double digit figure in 2013 to a triple digit figure in 2017. The challenge of addressing this problem in the Philippines is shared by other developing countries in Southeast Asia and other parts of the world. Since 2013 when the PNPACG was established, cybercrime data continue to be accumulated but were not given much attention and significance in research. Previous studies highlight the importance of taking advantage of data analytics. However, the absence of empirical studies on cybercrime analytics in the country connotes the lack of exploitation of data analytics in facilitating cybercrime investigations. This study exploits Weka text mining tool in order to draw insights by classifying a given online scam dataset. Weka is considered as it is a java-based tool from the University of Waikato, New Zealand and it is a free and open source software under the GNU General Public License that supports text mining tasks performed in this study such as pre-processing and classification. Online scam textual data and some narratives from online scam victims were considered as dataset containing 82 documents with a total of 14,098 mainly Filipino words or attributes. J48 Decision Tree, Naïve Bayes, and Sequential Minimal Optimization were used to build classification models. All these three classifiers or algorithms were compared in terms of performance and prediction accuracy. The results show that J48 achieves the highest accuracy and the lowest error rate, followed by Naïve Bayes and then the SMO classifier. Also, the responses during validation reveal that police investigators prefer J48 over the other classifiers as it easy for them to understand and apply in cybercrime investigations. This demonstrates how text mining predictive analytics can assist the PNP-ACG in analyzing and identifying online scam criminal behaviors as it also highlights the importance of employing data mining tools in the legal and criminal investigation domains in the Philippines. Further work can be carried out in the future using different and a more inclusive cybercrime datasets and other classification techniques in Weka or any other data mining tool, and other data mining tasks such as crime prevention and prediction, clustering, finding leads, trends and patterns of criminal activities, among others. 2018-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_masteral/5522 Master's Theses English Animo Repository Computer fraud--Philippines Computer crimes--Philippines Fraud--Philippines
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
topic Computer fraud--Philippines
Computer crimes--Philippines
Fraud--Philippines
spellingShingle Computer fraud--Philippines
Computer crimes--Philippines
Fraud--Philippines
Palad, Eddie Bouy B.
Document classification of Filipino online scam incident text using data mining techniques
description The increasing number of online transactions and other internet activities give rise to the proliferation of online scam. The Philippine National Police Anti Cybercrime Group (PNP-ACG) reported an increasing number of complaints from a double digit figure in 2013 to a triple digit figure in 2017. The challenge of addressing this problem in the Philippines is shared by other developing countries in Southeast Asia and other parts of the world. Since 2013 when the PNPACG was established, cybercrime data continue to be accumulated but were not given much attention and significance in research. Previous studies highlight the importance of taking advantage of data analytics. However, the absence of empirical studies on cybercrime analytics in the country connotes the lack of exploitation of data analytics in facilitating cybercrime investigations. This study exploits Weka text mining tool in order to draw insights by classifying a given online scam dataset. Weka is considered as it is a java-based tool from the University of Waikato, New Zealand and it is a free and open source software under the GNU General Public License that supports text mining tasks performed in this study such as pre-processing and classification. Online scam textual data and some narratives from online scam victims were considered as dataset containing 82 documents with a total of 14,098 mainly Filipino words or attributes. J48 Decision Tree, Naïve Bayes, and Sequential Minimal Optimization were used to build classification models. All these three classifiers or algorithms were compared in terms of performance and prediction accuracy. The results show that J48 achieves the highest accuracy and the lowest error rate, followed by Naïve Bayes and then the SMO classifier. Also, the responses during validation reveal that police investigators prefer J48 over the other classifiers as it easy for them to understand and apply in cybercrime investigations. This demonstrates how text mining predictive analytics can assist the PNP-ACG in analyzing and identifying online scam criminal behaviors as it also highlights the importance of employing data mining tools in the legal and criminal investigation domains in the Philippines. Further work can be carried out in the future using different and a more inclusive cybercrime datasets and other classification techniques in Weka or any other data mining tool, and other data mining tasks such as crime prevention and prediction, clustering, finding leads, trends and patterns of criminal activities, among others.
format text
author Palad, Eddie Bouy B.
author_facet Palad, Eddie Bouy B.
author_sort Palad, Eddie Bouy B.
title Document classification of Filipino online scam incident text using data mining techniques
title_short Document classification of Filipino online scam incident text using data mining techniques
title_full Document classification of Filipino online scam incident text using data mining techniques
title_fullStr Document classification of Filipino online scam incident text using data mining techniques
title_full_unstemmed Document classification of Filipino online scam incident text using data mining techniques
title_sort document classification of filipino online scam incident text using data mining techniques
publisher Animo Repository
publishDate 2018
url https://animorepository.dlsu.edu.ph/etd_masteral/5522
_version_ 1819113596490088448