Efficient Evaluation of Continuous Text Seach Queries

Consider a text filtering server that monitors a stream of incoming documents for a set of users, who register their interests in the form of continuous text search queries. The task of the server is to constantly maintain for each query a ranked result list, comprising the recent documents (drawn f...

Full description

Saved in:

Bibliographic Details
Main Authors:	MOURATIDIS, Kyriakos, PANG, Hwee Hwa
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2011
Subjects:	Continuous queries document streams text filtering Databases and Information Systems Numerical Analysis and Scientific Computing
Online Access:	https://ink.library.smu.edu.sg/sis_research/812 https://ink.library.smu.edu.sg/context/sis_research/article/1811/viewcontent/TKDE11_ConTextQueries.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-1811
record_format	dspace
spelling	sg-smu-ink.sis_research-18112017-03-01T10:21:06Z Efficient Evaluation of Continuous Text Seach Queries MOURATIDIS, Kyriakos PANG, Hwee Hwa Consider a text filtering server that monitors a stream of incoming documents for a set of users, who register their interests in the form of continuous text search queries. The task of the server is to constantly maintain for each query a ranked result list, comprising the recent documents (drawn from a sliding window) with the highest similarity to the query. Such a system underlies many text monitoring applications that need to cope with heavy document traffic, such as news and email monitoring.In this paper, we propose the first solution for processing continuous text queries efficiently. Our objective is to support a large number of user queries while sustaining high document arrival rates. Our solution indexes the streamed documents in main memory with a structure based on the principles of the inverted file, and processes document arrival and expiration events with an incremental threshold-based method. We distinguish between two versions of the monitoring algorithm, an eager and a lazy one, which differ in how aggressively they manage the thresholds on the inverted index. Using benchmark queries over a stream of real documents, we experimentally verify the efficiency of our methodology; both its versions are at least an order of magnitude faster than a competitor constructed from existing techniques, with lazy being the best approach overall. 2011-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/812 info:doi/10.1109/TKDE.2011.125 https://ink.library.smu.edu.sg/context/sis_research/article/1811/viewcontent/TKDE11_ConTextQueries.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Continuous queries document streams text filtering Databases and Information Systems Numerical Analysis and Scientific Computing
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Continuous queries document streams text filtering Databases and Information Systems Numerical Analysis and Scientific Computing
spellingShingle	Continuous queries document streams text filtering Databases and Information Systems Numerical Analysis and Scientific Computing MOURATIDIS, Kyriakos PANG, Hwee Hwa Efficient Evaluation of Continuous Text Seach Queries
description	Consider a text filtering server that monitors a stream of incoming documents for a set of users, who register their interests in the form of continuous text search queries. The task of the server is to constantly maintain for each query a ranked result list, comprising the recent documents (drawn from a sliding window) with the highest similarity to the query. Such a system underlies many text monitoring applications that need to cope with heavy document traffic, such as news and email monitoring.In this paper, we propose the first solution for processing continuous text queries efficiently. Our objective is to support a large number of user queries while sustaining high document arrival rates. Our solution indexes the streamed documents in main memory with a structure based on the principles of the inverted file, and processes document arrival and expiration events with an incremental threshold-based method. We distinguish between two versions of the monitoring algorithm, an eager and a lazy one, which differ in how aggressively they manage the thresholds on the inverted index. Using benchmark queries over a stream of real documents, we experimentally verify the efficiency of our methodology; both its versions are at least an order of magnitude faster than a competitor constructed from existing techniques, with lazy being the best approach overall.
format	text
author	MOURATIDIS, Kyriakos PANG, Hwee Hwa
author_facet	MOURATIDIS, Kyriakos PANG, Hwee Hwa
author_sort	MOURATIDIS, Kyriakos
title	Efficient Evaluation of Continuous Text Seach Queries
title_short	Efficient Evaluation of Continuous Text Seach Queries
title_full	Efficient Evaluation of Continuous Text Seach Queries
title_fullStr	Efficient Evaluation of Continuous Text Seach Queries
title_full_unstemmed	Efficient Evaluation of Continuous Text Seach Queries
title_sort	efficient evaluation of continuous text seach queries
publisher	Institutional Knowledge at Singapore Management University
publishDate	2011
url	https://ink.library.smu.edu.sg/sis_research/812 https://ink.library.smu.edu.sg/context/sis_research/article/1811/viewcontent/TKDE11_ConTextQueries.pdf
_version_	1770570724657856512

Efficient Evaluation of Continuous Text Seach Queries

Similar Items