Web structure analysis for information mining

Our approach to extracting information from the web analyzes the structural content of web pages through exploiting the latent information given by HTML tags. For each specific extraction task, an object model is created consisting of the salient fields to be extracted and the corresponding extracti...

Full description

Saved in:

Bibliographic Details
Main Authors:	VIJJAPPU, Lakshmi, TAN, Ah-hwee, TAN, Chew-Lim
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2003
Subjects:	Databases and Information Systems OS and Networks
Online Access:	https://ink.library.smu.edu.sg/sis_research/5255
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-6258
record_format	dspace
spelling	sg-smu-ink.sis_research-62582020-07-23T18:12:03Z Web structure analysis for information mining VIJJAPPU, Lakshmi TAN, Ah-hwee TAN, Chew-Lim Our approach to extracting information from the web analyzes the structural content of web pages through exploiting the latent information given by HTML tags. For each specific extraction task, an object model is created consisting of the salient fields to be extracted and the corresponding extraction rules based on a library of HTML parsing functions. We derive extraction rules for both single-slot and multiple-slot extraction tasks which we illustrate through two sample domains. 2003-12-05T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/5255 info:doi/10.1142/9789812775375_0003 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems OS and Networks
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Databases and Information Systems OS and Networks
spellingShingle	Databases and Information Systems OS and Networks VIJJAPPU, Lakshmi TAN, Ah-hwee TAN, Chew-Lim Web structure analysis for information mining
description	Our approach to extracting information from the web analyzes the structural content of web pages through exploiting the latent information given by HTML tags. For each specific extraction task, an object model is created consisting of the salient fields to be extracted and the corresponding extraction rules based on a library of HTML parsing functions. We derive extraction rules for both single-slot and multiple-slot extraction tasks which we illustrate through two sample domains.
format	text
author	VIJJAPPU, Lakshmi TAN, Ah-hwee TAN, Chew-Lim
author_facet	VIJJAPPU, Lakshmi TAN, Ah-hwee TAN, Chew-Lim
author_sort	VIJJAPPU, Lakshmi
title	Web structure analysis for information mining
title_short	Web structure analysis for information mining
title_full	Web structure analysis for information mining
title_fullStr	Web structure analysis for information mining
title_full_unstemmed	Web structure analysis for information mining
title_sort	web structure analysis for information mining
publisher	Institutional Knowledge at Singapore Management University
publishDate	2003
url	https://ink.library.smu.edu.sg/sis_research/5255
_version_	1770575351153426432

Web structure analysis for information mining

Similar Items