Web structure analysis for information mining

Our approach to extracting information from the web analyzes the structural content of web pages through exploiting the latent information given by HTML tags. For each specific extraction task, an object model is created consisting of the salient fields to be extracted and the corresponding extracti...

Full description

Saved in:
Bibliographic Details
Main Authors: VIJJAPPU, Lakshmi, TAN, Ah-hwee, TAN, Chew-Lim
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2003
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/5255
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-6258
record_format dspace
spelling sg-smu-ink.sis_research-62582020-07-23T18:12:03Z Web structure analysis for information mining VIJJAPPU, Lakshmi TAN, Ah-hwee TAN, Chew-Lim Our approach to extracting information from the web analyzes the structural content of web pages through exploiting the latent information given by HTML tags. For each specific extraction task, an object model is created consisting of the salient fields to be extracted and the corresponding extraction rules based on a library of HTML parsing functions. We derive extraction rules for both single-slot and multiple-slot extraction tasks which we illustrate through two sample domains. 2003-12-05T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/5255 info:doi/10.1142/9789812775375_0003 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems OS and Networks
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
OS and Networks
spellingShingle Databases and Information Systems
OS and Networks
VIJJAPPU, Lakshmi
TAN, Ah-hwee
TAN, Chew-Lim
Web structure analysis for information mining
description Our approach to extracting information from the web analyzes the structural content of web pages through exploiting the latent information given by HTML tags. For each specific extraction task, an object model is created consisting of the salient fields to be extracted and the corresponding extraction rules based on a library of HTML parsing functions. We derive extraction rules for both single-slot and multiple-slot extraction tasks which we illustrate through two sample domains.
format text
author VIJJAPPU, Lakshmi
TAN, Ah-hwee
TAN, Chew-Lim
author_facet VIJJAPPU, Lakshmi
TAN, Ah-hwee
TAN, Chew-Lim
author_sort VIJJAPPU, Lakshmi
title Web structure analysis for information mining
title_short Web structure analysis for information mining
title_full Web structure analysis for information mining
title_fullStr Web structure analysis for information mining
title_full_unstemmed Web structure analysis for information mining
title_sort web structure analysis for information mining
publisher Institutional Knowledge at Singapore Management University
publishDate 2003
url https://ink.library.smu.edu.sg/sis_research/5255
_version_ 1770575351153426432