Finding and Classifying Web Units in Web Sites

In web classification, most researchers assume that the objects to be classified are individual web pages from one or more websites. In practice, the assumption is too restrictive since a web page itself may not carry sufficient information for it to be treated as an instance of some semantic class...

Full description

Saved in:
Bibliographic Details
Main Authors: LIM, Ee Peng, SUN, Aixin
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2005
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/139
http://dx.doi.org/10.1504/IJBIDM.2005.008361
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-1138
record_format dspace
spelling sg-smu-ink.sis_research-11382010-09-22T14:00:36Z Finding and Classifying Web Units in Web Sites LIM, Ee Peng SUN, Aixin In web classification, most researchers assume that the objects to be classified are individual web pages from one or more websites. In practice, the assumption is too restrictive since a web page itself may not carry sufficient information for it to be treated as an instance of some semantic class or concept. In this paper, we relax this assumption and allow a subgraph of web pages to represent an instance of the semantic concept. Such a subgraph of web pages is known as a web unit. To construct and classify web units, we formulate the web unit mining problem and propose an iterative web unit mining (iWUM) method. The iWUM method first finds subgraphs of web pages using knowledge about website structure and connectivity among the web pages. From these web subgraphs, web units are constructed and classified into categories in an iterative manner. Our experiments using the WebKB dataset showed that iWUM was able to construct web units and classify web units with high accuracy for the more structured parts of a website. 2005-01-01T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/139 info:doi/10.1504/IJBIDM.2005.008361 http://dx.doi.org/10.1504/IJBIDM.2005.008361 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Databases and Information Systems
Numerical Analysis and Scientific Computing
LIM, Ee Peng
SUN, Aixin
Finding and Classifying Web Units in Web Sites
description In web classification, most researchers assume that the objects to be classified are individual web pages from one or more websites. In practice, the assumption is too restrictive since a web page itself may not carry sufficient information for it to be treated as an instance of some semantic class or concept. In this paper, we relax this assumption and allow a subgraph of web pages to represent an instance of the semantic concept. Such a subgraph of web pages is known as a web unit. To construct and classify web units, we formulate the web unit mining problem and propose an iterative web unit mining (iWUM) method. The iWUM method first finds subgraphs of web pages using knowledge about website structure and connectivity among the web pages. From these web subgraphs, web units are constructed and classified into categories in an iterative manner. Our experiments using the WebKB dataset showed that iWUM was able to construct web units and classify web units with high accuracy for the more structured parts of a website.
format text
author LIM, Ee Peng
SUN, Aixin
author_facet LIM, Ee Peng
SUN, Aixin
author_sort LIM, Ee Peng
title Finding and Classifying Web Units in Web Sites
title_short Finding and Classifying Web Units in Web Sites
title_full Finding and Classifying Web Units in Web Sites
title_fullStr Finding and Classifying Web Units in Web Sites
title_full_unstemmed Finding and Classifying Web Units in Web Sites
title_sort finding and classifying web units in web sites
publisher Institutional Knowledge at Singapore Management University
publishDate 2005
url https://ink.library.smu.edu.sg/sis_research/139
http://dx.doi.org/10.1504/IJBIDM.2005.008361
_version_ 1770568897702920192