Document classification of Filipino online scam incident text using data mining techniques

The increasing number of online transactions and other internet activities give rise to the proliferation of online scam. The Philippine National Police - Anti Cybercrime Group (PNP-ACG) reported an increasing number of complaints from a double digit figure in 2013 to a triple digit figure in 2017....

Full description

Saved in:
Bibliographic Details
Main Authors: Palad, Eddie Bouy B., Tangkeko, Marivic S., Magpantay, Lissa Andrea K., Sipin, Glenn L.
Format: text
Published: Animo Repository 2019
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/faculty_research/2696
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
id oai:animorepository.dlsu.edu.ph:faculty_research-3695
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:faculty_research-36952021-10-27T07:35:42Z Document classification of Filipino online scam incident text using data mining techniques Palad, Eddie Bouy B. Tangkeko, Marivic S. Magpantay, Lissa Andrea K. Sipin, Glenn L. The increasing number of online transactions and other internet activities give rise to the proliferation of online scam. The Philippine National Police - Anti Cybercrime Group (PNP-ACG) reported an increasing number of complaints from a double digit figure in 2013 to a triple digit figure in 2017. The challenge of addressing this problem in the Philippines is shared by other developing countries in Southeast Asia and other parts of the world. Since 2013 when the PNP-ACG was established, cybercrime data continue to be accumulated but were not given much attention and significance in research. Previous studies highlight the importance of taking advantage of data mining. However, the absence of empirical studies on cybercrime analytics in the country connotes the lack of exploitation of data mining in facilitating cybercrime investigations. This study exploits Weka text mining tool in order to draw insights by classifying a given online scam dataset. Online scam unstructured data were considered as dataset containing a total of 14, 098 mainly Filipino words. J48 Decision Tree, Naïve Bayes, and Sequential Minimal Optimization were used to build classification models. All these three classifiers or algorithms were compared in terms of performance and prediction accuracy. The results show that J48 achieves the highest accuracy and the lowest error rate, followed by Naïve Bayes and then the SMO classifier. Also, the responses during validation reveal that J48 is preferred over the other classifiers as it easy to understand and apply in cybercrime investigations. This demonstrates how text mining can assist the PNP-ACG in analyzing online scam criminal data as it also highlights the importance of employing data mining tools in the legal and criminal investigation domains in the Philippines. Further work can be carried out in the future using different and a more inclusive cybercrime datasets and other classification techniques in Weka or any other data mining tool, and other data mining tasks such as crime prevention and prediction, clustering, finding leads, trends and patterns of criminal activities, among others. © 2019 IEEE. 2019-09-01T07:00:00Z text https://animorepository.dlsu.edu.ph/faculty_research/2696 Faculty Research Work Animo Repository Computer crimes--Philippines Computer fraud--Philippines Text data mining Computer Sciences
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
topic Computer crimes--Philippines
Computer fraud--Philippines
Text data mining
Computer Sciences
spellingShingle Computer crimes--Philippines
Computer fraud--Philippines
Text data mining
Computer Sciences
Palad, Eddie Bouy B.
Tangkeko, Marivic S.
Magpantay, Lissa Andrea K.
Sipin, Glenn L.
Document classification of Filipino online scam incident text using data mining techniques
description The increasing number of online transactions and other internet activities give rise to the proliferation of online scam. The Philippine National Police - Anti Cybercrime Group (PNP-ACG) reported an increasing number of complaints from a double digit figure in 2013 to a triple digit figure in 2017. The challenge of addressing this problem in the Philippines is shared by other developing countries in Southeast Asia and other parts of the world. Since 2013 when the PNP-ACG was established, cybercrime data continue to be accumulated but were not given much attention and significance in research. Previous studies highlight the importance of taking advantage of data mining. However, the absence of empirical studies on cybercrime analytics in the country connotes the lack of exploitation of data mining in facilitating cybercrime investigations. This study exploits Weka text mining tool in order to draw insights by classifying a given online scam dataset. Online scam unstructured data were considered as dataset containing a total of 14, 098 mainly Filipino words. J48 Decision Tree, Naïve Bayes, and Sequential Minimal Optimization were used to build classification models. All these three classifiers or algorithms were compared in terms of performance and prediction accuracy. The results show that J48 achieves the highest accuracy and the lowest error rate, followed by Naïve Bayes and then the SMO classifier. Also, the responses during validation reveal that J48 is preferred over the other classifiers as it easy to understand and apply in cybercrime investigations. This demonstrates how text mining can assist the PNP-ACG in analyzing online scam criminal data as it also highlights the importance of employing data mining tools in the legal and criminal investigation domains in the Philippines. Further work can be carried out in the future using different and a more inclusive cybercrime datasets and other classification techniques in Weka or any other data mining tool, and other data mining tasks such as crime prevention and prediction, clustering, finding leads, trends and patterns of criminal activities, among others. © 2019 IEEE.
format text
author Palad, Eddie Bouy B.
Tangkeko, Marivic S.
Magpantay, Lissa Andrea K.
Sipin, Glenn L.
author_facet Palad, Eddie Bouy B.
Tangkeko, Marivic S.
Magpantay, Lissa Andrea K.
Sipin, Glenn L.
author_sort Palad, Eddie Bouy B.
title Document classification of Filipino online scam incident text using data mining techniques
title_short Document classification of Filipino online scam incident text using data mining techniques
title_full Document classification of Filipino online scam incident text using data mining techniques
title_fullStr Document classification of Filipino online scam incident text using data mining techniques
title_full_unstemmed Document classification of Filipino online scam incident text using data mining techniques
title_sort document classification of filipino online scam incident text using data mining techniques
publisher Animo Repository
publishDate 2019
url https://animorepository.dlsu.edu.ph/faculty_research/2696
_version_ 1715215706942865408