Web structure analysis for information mining
Our approach to extracting information from the web analyzes the structural content of web pages through exploiting the latent information given by HTML tags. For each specific extraction task, an object model is created consisting of the salient fields to be extracted and the corresponding extracti...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2003
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/5255 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-6258 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-62582020-07-23T18:12:03Z Web structure analysis for information mining VIJJAPPU, Lakshmi TAN, Ah-hwee TAN, Chew-Lim Our approach to extracting information from the web analyzes the structural content of web pages through exploiting the latent information given by HTML tags. For each specific extraction task, an object model is created consisting of the salient fields to be extracted and the corresponding extraction rules based on a library of HTML parsing functions. We derive extraction rules for both single-slot and multiple-slot extraction tasks which we illustrate through two sample domains. 2003-12-05T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/5255 info:doi/10.1142/9789812775375_0003 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems OS and Networks |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems OS and Networks |
spellingShingle |
Databases and Information Systems OS and Networks VIJJAPPU, Lakshmi TAN, Ah-hwee TAN, Chew-Lim Web structure analysis for information mining |
description |
Our approach to extracting information from the web analyzes the structural content of web pages through exploiting the latent information given by HTML tags. For each specific extraction task, an object model is created consisting of the salient fields to be extracted and the corresponding extraction rules based on a library of HTML parsing functions. We derive extraction rules for both single-slot and multiple-slot extraction tasks which we illustrate through two sample domains. |
format |
text |
author |
VIJJAPPU, Lakshmi TAN, Ah-hwee TAN, Chew-Lim |
author_facet |
VIJJAPPU, Lakshmi TAN, Ah-hwee TAN, Chew-Lim |
author_sort |
VIJJAPPU, Lakshmi |
title |
Web structure analysis for information mining |
title_short |
Web structure analysis for information mining |
title_full |
Web structure analysis for information mining |
title_fullStr |
Web structure analysis for information mining |
title_full_unstemmed |
Web structure analysis for information mining |
title_sort |
web structure analysis for information mining |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2003 |
url |
https://ink.library.smu.edu.sg/sis_research/5255 |
_version_ |
1770575351153426432 |