Structured web indexing

The rapid growth of Web information and applications has made the Web not only an important source of information but also a hub for e-commerce activities. However, the current unstructured web documents in the form of HTML files have limited support for advanced web applications. To overcome this s...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Xu.
Other Authors: Lim, Ee Peng
Format: Theses and Dissertations
Published: 2008
Subjects:
Online Access:http://hdl.handle.net/10356/2373
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
id sg-ntu-dr.10356-2373
record_format dspace
spelling sg-ntu-dr.10356-23732023-03-04T00:32:45Z Structured web indexing Li, Xu. Lim, Ee Peng School of Computer Engineering Ng, Wee Keong DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval The rapid growth of Web information and applications has made the Web not only an important source of information but also a hub for e-commerce activities. However, the current unstructured web documents in the form of HTML files have limited support for advanced web applications. To overcome this shortcoming, the future web documents will likely be formatted in XML and existing HTML documents will gradually be converted to XML documents. With XML, the structure of web documents in form of DTDs can be provided as input to a search engine allowing the latter to exploit the structural knowledge in its query processing. In this report, we propose a query model that supports expressive queries on XML documents that share some common DTDs. As XML documents can embed well-structured links among one another, the query model also supports queries involving inter-document links. With both intra- and inter-document structures in our proposed query model, it is clear that the conventional indexing techniques can no longer be adequate. We have therefore designed a new indexing scheme that is built upon both the content and structures of XML documents. Based on the new indexing scheme, a new search engine that supports queries on the content and structures of web documents has been developed. Master of Engineering (SCE) 2008-09-17T09:01:21Z 2008-09-17T09:01:21Z 2000 2000 Thesis http://hdl.handle.net/10356/2373 Nanyang Technological University application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
topic DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Li, Xu.
Structured web indexing
description The rapid growth of Web information and applications has made the Web not only an important source of information but also a hub for e-commerce activities. However, the current unstructured web documents in the form of HTML files have limited support for advanced web applications. To overcome this shortcoming, the future web documents will likely be formatted in XML and existing HTML documents will gradually be converted to XML documents. With XML, the structure of web documents in form of DTDs can be provided as input to a search engine allowing the latter to exploit the structural knowledge in its query processing. In this report, we propose a query model that supports expressive queries on XML documents that share some common DTDs. As XML documents can embed well-structured links among one another, the query model also supports queries involving inter-document links. With both intra- and inter-document structures in our proposed query model, it is clear that the conventional indexing techniques can no longer be adequate. We have therefore designed a new indexing scheme that is built upon both the content and structures of XML documents. Based on the new indexing scheme, a new search engine that supports queries on the content and structures of web documents has been developed.
author2 Lim, Ee Peng
author_facet Lim, Ee Peng
Li, Xu.
format Theses and Dissertations
author Li, Xu.
author_sort Li, Xu.
title Structured web indexing
title_short Structured web indexing
title_full Structured web indexing
title_fullStr Structured web indexing
title_full_unstemmed Structured web indexing
title_sort structured web indexing
publishDate 2008
url http://hdl.handle.net/10356/2373
_version_ 1759853486626832384