Conceptual Classfication of Web Pages using Bootstrapping and Co-training Strategies
Web page classification is conducted on Web sites from the same information omain to help organizing Web pages for better browsing or searching. In this paper, we develop a Web page classification method using bootstrapping and otraining strategies to classify Web pages from different Web sites of...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2006
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/804 http://www.cais.ntu.edu.sg/~axsun/paper/sun_cyberscape06.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Summary: | Web page classification is conducted on Web sites from the same information omain to help organizing Web pages for better browsing or searching. In this paper, we develop a Web page classification method using bootstrapping and otraining strategies to classify Web pages from different Web sites of the same domain into a given set of categories. The two strategies allow us to use very limited user feedback for training classifiers for accurate classification. We used more than 100 conference Web sites in our experiments and showed that without much user labeled training data, our method could automatically assign correct category labels to Web pages. In particular, the co-training method was shown to outperform both the bootstrapping method and the traditional one. It has also been shown that the two proposed methods were able to exploit the anchor text features and other content features of Web pages in the category assignment. |
---|