Position score weighting technique for mining web content outliers.

The existing mining web content outlier methods used stemming algorithm to preprocess the web documents and leave the domain dictionary in their root words. The stemming algorithm was usually used to reduce derived words to their stem, base or root form. The stemming algorithm sometimes does not lea...

Full description

Saved in:
Bibliographic Details
Main Authors: Mustapha, Norwati, Mustapha, Aida
Format: Article
Language:English
English
Published: CESER Publications 2013
Online Access:http://psasir.upm.edu.my/id/eprint/30631/1/Position%20score%20weighting%20technique%20for%20mining%20web%20content%20outliers.pdf
http://psasir.upm.edu.my/id/eprint/30631/
http://www.ceser.in/ceserp/index.php/ijamas/issue/view/180
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Putra Malaysia
Language: English
English
id my.upm.eprints.30631
record_format eprints
spelling my.upm.eprints.306312015-10-08T06:52:10Z http://psasir.upm.edu.my/id/eprint/30631/ Position score weighting technique for mining web content outliers. Mustapha, Norwati Mustapha, Aida The existing mining web content outlier methods used stemming algorithm to preprocess the web documents and leave the domain dictionary in their root words. The stemming algorithm was usually used to reduce derived words to their stem, base or root form. The stemming algorithm sometimes does not leave a real word after removing the stem and it caused a problem to match words in the full word profile with the domain dictionary. Therefore this study uses stemmed domain dictionary and applies it with Term Frequency with Position Score (TF.PS) weighting technique which is derived from TF.IDF weighting technique from Information Retrieval (IR) in dissimilarity measure phase to see the efficiency of these technique for determining the outliers in the web content. The dataset is from The 20 Newsgroups Dataset. The result for stemmed domain dictionary with TF.PS weighting technique achieves up to 98.19% of accuracy and 90% of F1-Measure which is higher than previous techniques. CESER Publications 2013 Article PeerReviewed application/pdf en http://psasir.upm.edu.my/id/eprint/30631/1/Position%20score%20weighting%20technique%20for%20mining%20web%20content%20outliers.pdf Mustapha, Norwati and Mustapha, Aida (2013) Position score weighting technique for mining web content outliers. International Journal of Applied Mathematics and Statistics, 36 (6). pp. 77-86. ISSN 0973-7545 http://www.ceser.in/ceserp/index.php/ijamas/issue/view/180 English
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
English
description The existing mining web content outlier methods used stemming algorithm to preprocess the web documents and leave the domain dictionary in their root words. The stemming algorithm was usually used to reduce derived words to their stem, base or root form. The stemming algorithm sometimes does not leave a real word after removing the stem and it caused a problem to match words in the full word profile with the domain dictionary. Therefore this study uses stemmed domain dictionary and applies it with Term Frequency with Position Score (TF.PS) weighting technique which is derived from TF.IDF weighting technique from Information Retrieval (IR) in dissimilarity measure phase to see the efficiency of these technique for determining the outliers in the web content. The dataset is from The 20 Newsgroups Dataset. The result for stemmed domain dictionary with TF.PS weighting technique achieves up to 98.19% of accuracy and 90% of F1-Measure which is higher than previous techniques.
format Article
author Mustapha, Norwati
Mustapha, Aida
spellingShingle Mustapha, Norwati
Mustapha, Aida
Position score weighting technique for mining web content outliers.
author_facet Mustapha, Norwati
Mustapha, Aida
author_sort Mustapha, Norwati
title Position score weighting technique for mining web content outliers.
title_short Position score weighting technique for mining web content outliers.
title_full Position score weighting technique for mining web content outliers.
title_fullStr Position score weighting technique for mining web content outliers.
title_full_unstemmed Position score weighting technique for mining web content outliers.
title_sort position score weighting technique for mining web content outliers.
publisher CESER Publications
publishDate 2013
url http://psasir.upm.edu.my/id/eprint/30631/1/Position%20score%20weighting%20technique%20for%20mining%20web%20content%20outliers.pdf
http://psasir.upm.edu.my/id/eprint/30631/
http://www.ceser.in/ceserp/index.php/ijamas/issue/view/180
_version_ 1643830117156257792