Web classification of conceptual entities using co-training

Social networking websites, which profile objects with predefined attributes and their relationships, often rely heavily on their users to contribute the required information. We, however, have observed that many web pages are actually created collectively according to the composition of some physic...

Full description

Saved in:
Bibliographic Details
Main Authors: SUN, Aixin, LIU, Ying, LIM, Ee Peng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2011
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/1442
http://dx.doi.org/10.1016/j.eswa.2011.03.010
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-2441
record_format dspace
spelling sg-smu-ink.sis_research-24412012-01-10T09:43:58Z Web classification of conceptual entities using co-training SUN, Aixin LIU, Ying LIM, Ee Peng Social networking websites, which profile objects with predefined attributes and their relationships, often rely heavily on their users to contribute the required information. We, however, have observed that many web pages are actually created collectively according to the composition of some physical or abstract entity, e.g., company, people, and event. Furthermore, users often like to organize pages into conceptual categories for better search and retrieval, making it feasible to extract relevant attributes and relationships from the web. Given a set of entities each consisting of a set of web pages, we name the task of assigning pages to the corresponding conceptual categories conceptual web classification. To address this, we propose an entity-based co-training (EcT) algorithm which learns from the unlabeled examples to boost its performance. Different from existing co-training algorithms, EcT has taken into account the entity semantics hidden in web pages and requires no prior knowledge about the underlying class distribution which is crucial in standard co-training algorithms used in web classification. In our experiments, we evaluated EcT, standard co-training, and other three non co-training learning methods on Conf-425 dataset. Both EcT and co-training performed well when compared to the baseline methods that required large amount of training examples. 2011-11-01T07:00:00Z text https://ink.library.smu.edu.sg/sis_research/1442 info:doi/10.1016/j.eswa.2011.03.010 http://dx.doi.org/10.1016/j.eswa.2011.03.010 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Conceptual web classification Co-training Web classification Communication Technology and New Media Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Conceptual web classification
Co-training
Web classification
Communication Technology and New Media
Databases and Information Systems
spellingShingle Conceptual web classification
Co-training
Web classification
Communication Technology and New Media
Databases and Information Systems
SUN, Aixin
LIU, Ying
LIM, Ee Peng
Web classification of conceptual entities using co-training
description Social networking websites, which profile objects with predefined attributes and their relationships, often rely heavily on their users to contribute the required information. We, however, have observed that many web pages are actually created collectively according to the composition of some physical or abstract entity, e.g., company, people, and event. Furthermore, users often like to organize pages into conceptual categories for better search and retrieval, making it feasible to extract relevant attributes and relationships from the web. Given a set of entities each consisting of a set of web pages, we name the task of assigning pages to the corresponding conceptual categories conceptual web classification. To address this, we propose an entity-based co-training (EcT) algorithm which learns from the unlabeled examples to boost its performance. Different from existing co-training algorithms, EcT has taken into account the entity semantics hidden in web pages and requires no prior knowledge about the underlying class distribution which is crucial in standard co-training algorithms used in web classification. In our experiments, we evaluated EcT, standard co-training, and other three non co-training learning methods on Conf-425 dataset. Both EcT and co-training performed well when compared to the baseline methods that required large amount of training examples.
format text
author SUN, Aixin
LIU, Ying
LIM, Ee Peng
author_facet SUN, Aixin
LIU, Ying
LIM, Ee Peng
author_sort SUN, Aixin
title Web classification of conceptual entities using co-training
title_short Web classification of conceptual entities using co-training
title_full Web classification of conceptual entities using co-training
title_fullStr Web classification of conceptual entities using co-training
title_full_unstemmed Web classification of conceptual entities using co-training
title_sort web classification of conceptual entities using co-training
publisher Institutional Knowledge at Singapore Management University
publishDate 2011
url https://ink.library.smu.edu.sg/sis_research/1442
http://dx.doi.org/10.1016/j.eswa.2011.03.010
_version_ 1770571150608302080