Improved web page identification method using neural networks

In this paper, an improved web page classification method (IWPCM) using neural networks to identify the illicit contents of web pages is proposed. The proposed IWPCM approach is based on the improvement of feature selection of the web pages using class based feature vectors (CPBF). The CPBF feature...

Full description

Saved in:
Bibliographic Details
Main Authors: Selamat, Ali, Lee, Zhi Sam, Maarof, Mohd. Aizaini, Shamsuddin, Siti Mariyam
Format: Article
Published: World Scientific Publishing Co. Pte Ltd 2011
Subjects:
Online Access:http://eprints.utm.my/id/eprint/29213/
http://dx.doi.org/10.1142/S1469026811003008
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
id my.utm.29213
record_format eprints
spelling my.utm.292132019-03-17T03:03:28Z http://eprints.utm.my/id/eprint/29213/ Improved web page identification method using neural networks Selamat, Ali Lee, Zhi Sam Maarof, Mohd. Aizaini Shamsuddin, Siti Mariyam QA75 Electronic computers. Computer science In this paper, an improved web page classification method (IWPCM) using neural networks to identify the illicit contents of web pages is proposed. The proposed IWPCM approach is based on the improvement of feature selection of the web pages using class based feature vectors (CPBF). The CPBF feature selection approach has been calculated by considering the important term's weight for illicit web documents and reduce the dependency of the less important term's weight for normal web documents. The IWPCM approach has been examined using the modified term-weighting scheme by comparing it with several traditional term-weighting schemes for non-illicit and illicit web contents available from the web. The precision, recall, and F1 measures have been used to evaluate the effectiveness of the proposed IWPCM approach. The experimental results have shown that the proposed improved term-weighting scheme has been able to identify the non-illicit and illicit web contents available from the experimental datasets. World Scientific Publishing Co. Pte Ltd 2011-03 Article PeerReviewed Selamat, Ali and Lee, Zhi Sam and Maarof, Mohd. Aizaini and Shamsuddin, Siti Mariyam (2011) Improved web page identification method using neural networks. International Journal of Computational Intelligence and Applications, 10 (1). pp. 87-114. ISSN 1469-0268 http://dx.doi.org/10.1142/S1469026811003008 DOI:10.1142/S1469026811003008
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Selamat, Ali
Lee, Zhi Sam
Maarof, Mohd. Aizaini
Shamsuddin, Siti Mariyam
Improved web page identification method using neural networks
description In this paper, an improved web page classification method (IWPCM) using neural networks to identify the illicit contents of web pages is proposed. The proposed IWPCM approach is based on the improvement of feature selection of the web pages using class based feature vectors (CPBF). The CPBF feature selection approach has been calculated by considering the important term's weight for illicit web documents and reduce the dependency of the less important term's weight for normal web documents. The IWPCM approach has been examined using the modified term-weighting scheme by comparing it with several traditional term-weighting schemes for non-illicit and illicit web contents available from the web. The precision, recall, and F1 measures have been used to evaluate the effectiveness of the proposed IWPCM approach. The experimental results have shown that the proposed improved term-weighting scheme has been able to identify the non-illicit and illicit web contents available from the experimental datasets.
format Article
author Selamat, Ali
Lee, Zhi Sam
Maarof, Mohd. Aizaini
Shamsuddin, Siti Mariyam
author_facet Selamat, Ali
Lee, Zhi Sam
Maarof, Mohd. Aizaini
Shamsuddin, Siti Mariyam
author_sort Selamat, Ali
title Improved web page identification method using neural networks
title_short Improved web page identification method using neural networks
title_full Improved web page identification method using neural networks
title_fullStr Improved web page identification method using neural networks
title_full_unstemmed Improved web page identification method using neural networks
title_sort improved web page identification method using neural networks
publisher World Scientific Publishing Co. Pte Ltd
publishDate 2011
url http://eprints.utm.my/id/eprint/29213/
http://dx.doi.org/10.1142/S1469026811003008
_version_ 1643648250797883392