World Wide Web resource discovery
Query routing refers to the general resource discovery problem of selecting from a large set of accessible information sources the ones relevant to a given query (database selection), evaluating the query on the selected sources (query evaluation), and merging their results (result merging). As the...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2010
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/42555 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Query routing refers to the general resource discovery problem of selecting from a large
set of accessible information sources the ones relevant to a given query (database selection), evaluating the query on the selected sources (query evaluation), and merging their results (result merging). As the number of information sources on the Internet increases dramatically, query routing is becoming increasingly important. Nevertheless, much of the previous work in query routing focused on information sources that are document collections. Moreover, there has been little work done for collections that can be accessed only through some query interfaces. In this project, we focus on the database selection problem, an important subproblem of query routing, for bibliographic databases consisting of multiple text attributes. In particular, we first proposed three training-based database selection techniques known as TQS, TQC and TQG. These three techniques rely on training query results to determine the relevance of databases with respect to a given user query. Our experiments have
shown that TQG and TQC outperform TQS for the same number of training queries.
We further explored the use of clustering techniques to improve the performance of
database selection for bibliographic databases. Three clustering techniques, i.e. Single Pass Clustering (SPC), Reallocation Clustering (RC) and Constrained Clustering
(CC), have been experimented with two database ranking schemes know as ERS and
EGS. Our experiments showed that any clustering techniques combined with ERS will yield good performance. This research also looked into the implementation of a
query routing broker, known as ZBroker, developed for bibliographic database servers
supporting Z39.50 query interfaces on the Internet. |
---|