Conceptual Classfication of Web Pages using Bootstrapping and Co-training Strategies

Web page classification is conducted on Web sites from the same information omain to help organizing Web pages for better browsing or searching. In this paper, we develop a Web page classification method using bootstrapping and otraining strategies to classify Web pages from different Web sites of...

Full description

Saved in:
Bibliographic Details
Main Authors: LIM, Ee Peng, SUN, Aixin, Marissa, Maria
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2006
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/804
http://www.cais.ntu.edu.sg/~axsun/paper/sun_cyberscape06.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-1803
record_format dspace
spelling sg-smu-ink.sis_research-18032010-11-26T07:24:03Z Conceptual Classfication of Web Pages using Bootstrapping and Co-training Strategies LIM, Ee Peng SUN, Aixin Marissa, Maria Web page classification is conducted on Web sites from the same information omain to help organizing Web pages for better browsing or searching. In this paper, we develop a Web page classification method using bootstrapping and otraining strategies to classify Web pages from different Web sites of the same domain into a given set of categories. The two strategies allow us to use very limited user feedback for training classifiers for accurate classification. We used more than 100 conference Web sites in our experiments and showed that without much user labeled training data, our method could automatically assign correct category labels to Web pages. In particular, the co-training method was shown to outperform both the bootstrapping method and the traditional one. It has also been shown that the two proposed methods were able to exploit the anchor text features and other content features of Web pages in the category assignment. 2006-01-01T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/804 http://www.cais.ntu.edu.sg/~axsun/paper/sun_cyberscape06.pdf Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Web page classification Bootstrapping Training. Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Web page classification
Bootstrapping
Training.
Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Web page classification
Bootstrapping
Training.
Databases and Information Systems
Numerical Analysis and Scientific Computing
LIM, Ee Peng
SUN, Aixin
Marissa, Maria
Conceptual Classfication of Web Pages using Bootstrapping and Co-training Strategies
description Web page classification is conducted on Web sites from the same information omain to help organizing Web pages for better browsing or searching. In this paper, we develop a Web page classification method using bootstrapping and otraining strategies to classify Web pages from different Web sites of the same domain into a given set of categories. The two strategies allow us to use very limited user feedback for training classifiers for accurate classification. We used more than 100 conference Web sites in our experiments and showed that without much user labeled training data, our method could automatically assign correct category labels to Web pages. In particular, the co-training method was shown to outperform both the bootstrapping method and the traditional one. It has also been shown that the two proposed methods were able to exploit the anchor text features and other content features of Web pages in the category assignment.
format text
author LIM, Ee Peng
SUN, Aixin
Marissa, Maria
author_facet LIM, Ee Peng
SUN, Aixin
Marissa, Maria
author_sort LIM, Ee Peng
title Conceptual Classfication of Web Pages using Bootstrapping and Co-training Strategies
title_short Conceptual Classfication of Web Pages using Bootstrapping and Co-training Strategies
title_full Conceptual Classfication of Web Pages using Bootstrapping and Co-training Strategies
title_fullStr Conceptual Classfication of Web Pages using Bootstrapping and Co-training Strategies
title_full_unstemmed Conceptual Classfication of Web Pages using Bootstrapping and Co-training Strategies
title_sort conceptual classfication of web pages using bootstrapping and co-training strategies
publisher Institutional Knowledge at Singapore Management University
publishDate 2006
url https://ink.library.smu.edu.sg/sis_research/804
http://www.cais.ntu.edu.sg/~axsun/paper/sun_cyberscape06.pdf
_version_ 1770570721951481856