Structured web indexing
The rapid growth of Web information and applications has made the Web not only an important source of information but also a hub for e-commerce activities. However, the current unstructured web documents in the form of HTML files have limited support for advanced web applications. To overcome this s...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Published: |
2008
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/2373 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Summary: | The rapid growth of Web information and applications has made the Web not only an important source of information but also a hub for e-commerce activities. However, the current unstructured web documents in the form of HTML files have limited support for advanced web applications. To overcome this shortcoming, the future web documents will likely be formatted in XML and existing HTML documents will gradually be converted to XML documents. With XML, the structure of web documents in form of DTDs can be provided as input to a search engine allowing the latter to exploit the structural knowledge in its query processing. In this report, we propose a query model that supports expressive queries on XML documents that share some common DTDs. As XML documents can embed well-structured links among one another, the query model also supports queries involving inter-document links. With both intra- and inter-document structures in our proposed query model, it is clear that the conventional indexing techniques can no longer be adequate. We have therefore designed a new indexing scheme that is built upon both the content and structures of XML documents. Based on the new indexing scheme, a new search engine that supports queries on the content and structures of web documents has been developed. |
---|