Term frequency-information content for focused crawling to predict relevant web pages.
With the rapid growth of the Web, finding desirable information on the Internet is a tedious and time consuming task. Focused crawlers are the golden keys to solve this issue through mining of the Web content. In this regard, a variety of methods have been devised and implemented. Many of these meth...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English English |
Published: |
Advanced Institute of Convergence Information Technology
2013
|
Online Access: | http://psasir.upm.edu.my/id/eprint/30629/1/Term%20frequency.pdf http://psasir.upm.edu.my/id/eprint/30629/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Putra Malaysia |
Language: | English English |
Summary: | With the rapid growth of the Web, finding desirable information on the Internet is a tedious and time consuming task. Focused crawlers are the golden keys to solve this issue through mining of the Web content. In this regard, a variety of methods have been devised and implemented. Many of these methods coming from information retrieval viewpoint are not biased towards more informative terms
in multi-term topics (topics with more than one keyword). In this paper, by considering terms’ information contents, we propose Term Frequency-Information Content (TF-IC) method which assigns appropriate weight to each term in a multi-term topic. Through the conducted experiments, we
compare our method with other methods such as Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Semantic Indexing (LSI). Experimental results show that our method outperforms those two methods by retrieving more relevant pages for multi-term topics. |
---|