Re-engineering structures from web documents
To realize a wide range of applications (including digital libraries) on the Web, a more structured way of accessing the Web is required and such requirement can be facilitated by the use of XML standard. In this paper, we propose a general framework for reverse engineering (or re-engineering) the u...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2000
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/966 https://ink.library.smu.edu.sg/context/sis_research/article/1965/viewcontent/p67_moh.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-1965 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-19652018-06-20T03:13:39Z Re-engineering structures from web documents HUE, Moh Chuang LIM, Ee Peng NG, Wee-Keong To realize a wide range of applications (including digital libraries) on the Web, a more structured way of accessing the Web is required and such requirement can be facilitated by the use of XML standard. In this paper, we propose a general framework for reverse engineering (or re-engineering) the underlying structures i.e.,the DTD from a collection of similarly structured XML documents when they share some common but unknown DTDs. The essential data structures and algorithms for the DTD generation have been delveloped and experiments on real Web collections have been conducted to demonstrate their feasibilty. In addition, we also proposed a method ofimposing a constraint on the repetitiveness on the element in a DTD rule to further simplify the generated DTD without compromising their correctness. 2000-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/966 info:doi/10.1145/336597.336638 https://ink.library.smu.edu.sg/context/sis_research/article/1965/viewcontent/p67_moh.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems Numerical Analysis and Scientific Computing |
spellingShingle |
Databases and Information Systems Numerical Analysis and Scientific Computing HUE, Moh Chuang LIM, Ee Peng NG, Wee-Keong Re-engineering structures from web documents |
description |
To realize a wide range of applications (including digital libraries) on the Web, a more structured way of accessing the Web is required and such requirement can be facilitated by the use of XML standard. In this paper, we propose a general framework for reverse engineering (or re-engineering) the underlying structures i.e.,the DTD from a collection of similarly structured XML documents when they share some common but unknown DTDs. The essential data structures and algorithms for the DTD generation have been delveloped and experiments on real Web collections have been conducted to demonstrate their feasibilty. In addition, we also proposed a method ofimposing a constraint on the repetitiveness on the element in a DTD rule to further simplify the generated DTD without compromising their correctness. |
format |
text |
author |
HUE, Moh Chuang LIM, Ee Peng NG, Wee-Keong |
author_facet |
HUE, Moh Chuang LIM, Ee Peng NG, Wee-Keong |
author_sort |
HUE, Moh Chuang |
title |
Re-engineering structures from web documents |
title_short |
Re-engineering structures from web documents |
title_full |
Re-engineering structures from web documents |
title_fullStr |
Re-engineering structures from web documents |
title_full_unstemmed |
Re-engineering structures from web documents |
title_sort |
re-engineering structures from web documents |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2000 |
url |
https://ink.library.smu.edu.sg/sis_research/966 https://ink.library.smu.edu.sg/context/sis_research/article/1965/viewcontent/p67_moh.pdf |
_version_ |
1770570797294813184 |