Applying semantic similarity measures to enhance topic-specific web crawling

As the Internet grows rapidly, finding desirable information becomes a tedious and time consuming task. Topic-specific web crawlers, as utopian solutions, tackle this issue through traversing the Web and collecting information related to the topic of interest. In this regard, various methods are pro...

Full description

Saved in:
Bibliographic Details
Main Authors: Pesaranghader, Ali, Mustapha, Norwati, Pesaranghader, Ahmad
Format: Conference or Workshop Item
Published: IEEE (IEEEXplore) 2013
Online Access:http://psasir.upm.edu.my/id/eprint/41318/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Putra Malaysia
id my.upm.eprints.41318
record_format eprints
spelling my.upm.eprints.413182015-11-03T08:41:17Z http://psasir.upm.edu.my/id/eprint/41318/ Applying semantic similarity measures to enhance topic-specific web crawling Pesaranghader, Ali Mustapha, Norwati Pesaranghader, Ahmad As the Internet grows rapidly, finding desirable information becomes a tedious and time consuming task. Topic-specific web crawlers, as utopian solutions, tackle this issue through traversing the Web and collecting information related to the topic of interest. In this regard, various methods are proposed. Nevertheless, they hardly consider desired sense of the given topic which would certainly play an important role to find relevant web pages. In this paper, we attempt to improve topic-specific web crawling by disambiguating the sense of the topic. This would avoid crawling irrelevant links interlaced with other senses of the topic. For this purpose, by considering links hypertext semantic, we employ Lin semantic similarity measure in our crawler, named LinCrawler, to distinguish topic sense-related links from the others. Moreover, we compare LinCrawler against TFCrawler which only considers frequency of terms in hypertexts. Experimental results show LinCrawler outperforms TFCrawler to collect more relevant web pages. IEEE (IEEEXplore) 2013 Conference or Workshop Item NonPeerReviewed Pesaranghader, Ali and Mustapha, Norwati and Pesaranghader, Ahmad (2013) Applying semantic similarity measures to enhance topic-specific web crawling. In: 2013 13th International Conference on Intelligent Systems Design and Applications (ISDA), 8-10 Dec. 2013, Bangi, Selangor, Malaysia. (pp. 205-212). 10.1109/ISDA.2013.6920736
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
description As the Internet grows rapidly, finding desirable information becomes a tedious and time consuming task. Topic-specific web crawlers, as utopian solutions, tackle this issue through traversing the Web and collecting information related to the topic of interest. In this regard, various methods are proposed. Nevertheless, they hardly consider desired sense of the given topic which would certainly play an important role to find relevant web pages. In this paper, we attempt to improve topic-specific web crawling by disambiguating the sense of the topic. This would avoid crawling irrelevant links interlaced with other senses of the topic. For this purpose, by considering links hypertext semantic, we employ Lin semantic similarity measure in our crawler, named LinCrawler, to distinguish topic sense-related links from the others. Moreover, we compare LinCrawler against TFCrawler which only considers frequency of terms in hypertexts. Experimental results show LinCrawler outperforms TFCrawler to collect more relevant web pages.
format Conference or Workshop Item
author Pesaranghader, Ali
Mustapha, Norwati
Pesaranghader, Ahmad
spellingShingle Pesaranghader, Ali
Mustapha, Norwati
Pesaranghader, Ahmad
Applying semantic similarity measures to enhance topic-specific web crawling
author_facet Pesaranghader, Ali
Mustapha, Norwati
Pesaranghader, Ahmad
author_sort Pesaranghader, Ali
title Applying semantic similarity measures to enhance topic-specific web crawling
title_short Applying semantic similarity measures to enhance topic-specific web crawling
title_full Applying semantic similarity measures to enhance topic-specific web crawling
title_fullStr Applying semantic similarity measures to enhance topic-specific web crawling
title_full_unstemmed Applying semantic similarity measures to enhance topic-specific web crawling
title_sort applying semantic similarity measures to enhance topic-specific web crawling
publisher IEEE (IEEEXplore)
publishDate 2013
url http://psasir.upm.edu.my/id/eprint/41318/
_version_ 1643832963020881920