Structured web indexing

The rapid growth of Web information and applications has made the Web not only an important source of information but also a hub for e-commerce activities. However, the current unstructured web documents in the form of HTML files have limited support for advanced web applications. To overcome this s...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Xu.
Other Authors: Lim, Ee Peng
Format: Theses and Dissertations
Published: 2008
Subjects:
Online Access:http://hdl.handle.net/10356/2373
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Description
Summary:The rapid growth of Web information and applications has made the Web not only an important source of information but also a hub for e-commerce activities. However, the current unstructured web documents in the form of HTML files have limited support for advanced web applications. To overcome this shortcoming, the future web documents will likely be formatted in XML and existing HTML documents will gradually be converted to XML documents. With XML, the structure of web documents in form of DTDs can be provided as input to a search engine allowing the latter to exploit the structural knowledge in its query processing. In this report, we propose a query model that supports expressive queries on XML documents that share some common DTDs. As XML documents can embed well-structured links among one another, the query model also supports queries involving inter-document links. With both intra- and inter-document structures in our proposed query model, it is clear that the conventional indexing techniques can no longer be adequate. We have therefore designed a new indexing scheme that is built upon both the content and structures of XML documents. Based on the new indexing scheme, a new search engine that supports queries on the content and structures of web documents has been developed.