Hybridized feature set for accurate Arabic dark web pages classification

Security informatics and computational intelligence are gaining more importance in detecting terrorist activities as the extremist groups are misusing many of the available Internet services to incite violence and hatred. However, inadequate performance of statistical based computational intelligenc...

Full description

Saved in:
Bibliographic Details
Main Authors: Sabbah, T., Selamat, A.
Format: Conference or Workshop Item
Published: Springer Verlag 2015
Subjects:
Online Access:http://eprints.utm.my/id/eprint/59310/
http://dx.doi.org/10.1007/978-3-319-22689-7_13
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
id my.utm.59310
record_format eprints
spelling my.utm.593102022-03-06T04:37:40Z http://eprints.utm.my/id/eprint/59310/ Hybridized feature set for accurate Arabic dark web pages classification Sabbah, T. Selamat, A. T58.5-58.64 Information technology Security informatics and computational intelligence are gaining more importance in detecting terrorist activities as the extremist groups are misusing many of the available Internet services to incite violence and hatred. However, inadequate performance of statistical based computational intelligence methods reduces intelligent techniques efficiency in supporting counterterrorism efforts, and limits the early detection opportunities of potential terrorist activities. In this paper, we propose a feature set hybridization method, based on feature selection and extraction methods, for accurate content classification in Arabic dark web pages. The proposed method hybridizes the feature sets so that the generated feature set contains less number of features that capable of achieving higher classification performance. A selected dataset from Dark Web Forum Portal (DWFP) is used to test the performance of the proposed method that based on Term Frequency - Inverse Document Frequency (TFIDF) as feature selection method on one hand, while Random Projection (RP) and Principal Component Analysis (PCA) feature selection methods on the other hand. Classification results using the Support Vector Machine (SVM) classifier show that a high classification performance has been achieved base on the hybridization of TFIDF and PCA, where 99 % of F1 and accuracy performance has been achieved. Springer Verlag 2015 Conference or Workshop Item PeerReviewed Sabbah, T. and Selamat, A. (2015) Hybridized feature set for accurate Arabic dark web pages classification. In: 14th International Conference on New Trends in Intelligent Software Methodology, Tools, and Techniques, SoMeT 2015, 15-17 Sep 2015, Naples. http://dx.doi.org/10.1007/978-3-319-22689-7_13 DOI: 10.1007/978-3-319-22689-7_13
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic T58.5-58.64 Information technology
spellingShingle T58.5-58.64 Information technology
Sabbah, T.
Selamat, A.
Hybridized feature set for accurate Arabic dark web pages classification
description Security informatics and computational intelligence are gaining more importance in detecting terrorist activities as the extremist groups are misusing many of the available Internet services to incite violence and hatred. However, inadequate performance of statistical based computational intelligence methods reduces intelligent techniques efficiency in supporting counterterrorism efforts, and limits the early detection opportunities of potential terrorist activities. In this paper, we propose a feature set hybridization method, based on feature selection and extraction methods, for accurate content classification in Arabic dark web pages. The proposed method hybridizes the feature sets so that the generated feature set contains less number of features that capable of achieving higher classification performance. A selected dataset from Dark Web Forum Portal (DWFP) is used to test the performance of the proposed method that based on Term Frequency - Inverse Document Frequency (TFIDF) as feature selection method on one hand, while Random Projection (RP) and Principal Component Analysis (PCA) feature selection methods on the other hand. Classification results using the Support Vector Machine (SVM) classifier show that a high classification performance has been achieved base on the hybridization of TFIDF and PCA, where 99 % of F1 and accuracy performance has been achieved.
format Conference or Workshop Item
author Sabbah, T.
Selamat, A.
author_facet Sabbah, T.
Selamat, A.
author_sort Sabbah, T.
title Hybridized feature set for accurate Arabic dark web pages classification
title_short Hybridized feature set for accurate Arabic dark web pages classification
title_full Hybridized feature set for accurate Arabic dark web pages classification
title_fullStr Hybridized feature set for accurate Arabic dark web pages classification
title_full_unstemmed Hybridized feature set for accurate Arabic dark web pages classification
title_sort hybridized feature set for accurate arabic dark web pages classification
publisher Springer Verlag
publishDate 2015
url http://eprints.utm.my/id/eprint/59310/
http://dx.doi.org/10.1007/978-3-319-22689-7_13
_version_ 1728051330613772288