Web Unit Mining: Finding and classifying subgraphs of web pages

In web classification, most researchers assume that the objects to classify are individual web pages from one or more web sites. In practice, the assumption is too restrictive since a web page itself may not always correspond to a concept instance of some semantic concept (or category) given to the...

Full description

Saved in:
Bibliographic Details
Main Authors: SUN, Aixin, LIM, Ee Peng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2003
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/991
https://ink.library.smu.edu.sg/context/sis_research/article/1990/viewcontent/sun_cikm03.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-1990
record_format dspace
spelling sg-smu-ink.sis_research-19902018-06-20T06:15:20Z Web Unit Mining: Finding and classifying subgraphs of web pages SUN, Aixin LIM, Ee Peng In web classification, most researchers assume that the objects to classify are individual web pages from one or more web sites. In practice, the assumption is too restrictive since a web page itself may not always correspond to a concept instance of some semantic concept (or category) given to the classification task. In this paper, we want to relax this assumption and allow a concept instance to be represented by a subgraph of web pages or a set of web pages. We identify several new issues to be addressed when the assumption is removed, and formulate the web unit mining problem. We also propose an iterative web unit mining (iWUM) method that first finds subgraphs of web pages using some knowledge about web site structure. From these web subgraphs, web units are constructed and classified into semantic concepts (or categories) in an iterative manner. Our experiments using the WebKB dataset showed that iWUM improves the overall classification performance and works very well on the more structured parts of a web site. 2003-11-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/991 info:doi/10.1145/956863.956885 https://ink.library.smu.edu.sg/context/sis_research/article/1990/viewcontent/sun_cikm03.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Databases and Information Systems
Numerical Analysis and Scientific Computing
SUN, Aixin
LIM, Ee Peng
Web Unit Mining: Finding and classifying subgraphs of web pages
description In web classification, most researchers assume that the objects to classify are individual web pages from one or more web sites. In practice, the assumption is too restrictive since a web page itself may not always correspond to a concept instance of some semantic concept (or category) given to the classification task. In this paper, we want to relax this assumption and allow a concept instance to be represented by a subgraph of web pages or a set of web pages. We identify several new issues to be addressed when the assumption is removed, and formulate the web unit mining problem. We also propose an iterative web unit mining (iWUM) method that first finds subgraphs of web pages using some knowledge about web site structure. From these web subgraphs, web units are constructed and classified into semantic concepts (or categories) in an iterative manner. Our experiments using the WebKB dataset showed that iWUM improves the overall classification performance and works very well on the more structured parts of a web site.
format text
author SUN, Aixin
LIM, Ee Peng
author_facet SUN, Aixin
LIM, Ee Peng
author_sort SUN, Aixin
title Web Unit Mining: Finding and classifying subgraphs of web pages
title_short Web Unit Mining: Finding and classifying subgraphs of web pages
title_full Web Unit Mining: Finding and classifying subgraphs of web pages
title_fullStr Web Unit Mining: Finding and classifying subgraphs of web pages
title_full_unstemmed Web Unit Mining: Finding and classifying subgraphs of web pages
title_sort web unit mining: finding and classifying subgraphs of web pages
publisher Institutional Knowledge at Singapore Management University
publishDate 2003
url https://ink.library.smu.edu.sg/sis_research/991
https://ink.library.smu.edu.sg/context/sis_research/article/1990/viewcontent/sun_cikm03.pdf
_version_ 1770570816413499392