Query processing in publish/subscribe systems for textual data streams
With the rapid development of online social media (e.g., Facebook and Flickr) and micro-blogging services (e.g., Twitter, Tumblr, and Weibo), huge amounts of streaming text data are being generated in an unprecedented scale. Such data is particularly well-suited for information dissemination. The d...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/66232 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-66232 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-662322023-03-04T00:34:54Z Query processing in publish/subscribe systems for textual data streams Chen, Lisi Dr Gao Cong School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Information systems::Database management With the rapid development of online social media (e.g., Facebook and Flickr) and micro-blogging services (e.g., Twitter, Tumblr, and Weibo), huge amounts of streaming text data are being generated in an unprecedented scale. Such data is particularly well-suited for information dissemination. The demand for disseminating interesting information from data stream to users gives prominence to content based publish/subscribe system, where users can personalize their requirements by issuing a subscription query and they will be notified when items matching those requirements are captured from the data stream. Although content based publish/subscribe system is successfully applied in many real-world applications, the existing work on content based publish/subscribe system has the following limitations. First, existing content based publish/subscribe systems usually do not consider the location aspect. With the deployment and use of GPS-enabled devices, spatial, or geographical, documents are emerging where content is associated with locations (e.g., Points of Interest on Google Map, check-ins on Foursquare, and geo-tagged tweets on Twitter). As a result of the development, users may want to issue subscription queries with both keyword and location requirements. For instance, a user who subscribes for promotional information of seafood restaurants may be only interested in the information posted by nearby seafood restaurants. Second, existing publish/subscribe systems do not consider the issue of query result diversification, which has drawn considerable attention as a way to increase user satisfaction in web search. To overcome the first limitation, we conduct the first study on location-aware publish/subscribe for textual data stream. Specifically, we propose a new type of subscription query, Boolean Range Continuous (BRC) query, for publish/subscribe systems, which continuously finds spatio-temporal documents whose locations fall in the query region and textual information satisfies the query Boolean predicates over a data stream. We develop an efficient system for addressing the problem. To improve the quality of results returned by each subscription query, we propose a new type of location based subscription query, Temporal Spatial-Keyword Top-k Subscription (TaSK) query, that rank-orders spatio-temporal documents and continuously maintains the top-ranked documents based on a score that considers the following three aspects: (1) Text relevance; (2) Spatial proximity; (3) Recency of document. We develop an efficient approach to maintaining the up-to-date top-k results for a large number of TaSK queries over a stream of spatio-temporal documents. To address the second limitation, we develop the first diversity-aware publish/subscribe system over a text stream. Specifically, we propose the Diversity-Aware Top-k Subscription (DAS) query, which takes into account text relevance, document recency, and result diversity in matching a new document. We propose an efficient mechanism to continuously maintain an up-to-date result set that contains k most recently returned documents over a text stream for each DAS query. Doctor of Philosophy (SCE) 2016-03-21T04:09:38Z 2016-03-21T04:09:38Z 2016 Thesis http://hdl.handle.net/10356/66232 en 151 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Information systems::Database management |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Information systems::Database management Chen, Lisi Query processing in publish/subscribe systems for textual data streams |
description |
With the rapid development of online social media (e.g., Facebook and Flickr) and micro-blogging services (e.g., Twitter, Tumblr, and Weibo), huge amounts of streaming text data are being generated in an unprecedented scale. Such data is particularly well-suited for information dissemination.
The demand for disseminating interesting information from data stream to users gives prominence to content based publish/subscribe system, where users can personalize their requirements by issuing a subscription query and they will be notified when items matching those requirements are captured from the data stream.
Although content based publish/subscribe system is successfully applied in many real-world applications, the existing work on content based publish/subscribe system has the following limitations. First, existing content based publish/subscribe systems usually do not consider the location aspect. With the deployment and use of GPS-enabled devices, spatial, or geographical, documents are emerging where content is associated with locations (e.g., Points of Interest on Google Map, check-ins on Foursquare, and geo-tagged tweets on Twitter). As a result of the development, users may want to issue subscription queries with both keyword and location requirements. For instance, a user who subscribes for promotional information of seafood restaurants may be only interested in the information posted by nearby seafood restaurants.
Second, existing publish/subscribe systems do not consider the issue of query result diversification, which has drawn considerable attention as a way to increase user satisfaction in web search.
To overcome the first limitation, we conduct the first study on location-aware publish/subscribe for textual data stream. Specifically, we propose a new type of subscription query, Boolean Range Continuous (BRC) query, for publish/subscribe systems, which continuously finds spatio-temporal documents whose locations fall in the query region and textual information satisfies the query Boolean predicates over a data stream. We develop an efficient system for addressing the problem.
To improve the quality of results returned by each subscription query, we propose a new type of location based subscription query, Temporal Spatial-Keyword Top-k Subscription (TaSK) query, that rank-orders spatio-temporal documents and continuously maintains the top-ranked documents based on a score that considers the following three aspects: (1) Text relevance; (2) Spatial proximity; (3) Recency of document. We develop an efficient approach to maintaining the up-to-date top-k results for a large number of TaSK queries over a stream of spatio-temporal documents.
To address the second limitation, we develop the first diversity-aware publish/subscribe system over a text stream. Specifically, we propose the Diversity-Aware Top-k Subscription (DAS) query, which takes into account text relevance, document recency, and result diversity in matching a new document. We propose an efficient mechanism to continuously maintain an up-to-date result set that contains k most recently returned documents over a text stream for each DAS query. |
author2 |
Dr Gao Cong |
author_facet |
Dr Gao Cong Chen, Lisi |
format |
Theses and Dissertations |
author |
Chen, Lisi |
author_sort |
Chen, Lisi |
title |
Query processing in publish/subscribe systems for textual data streams |
title_short |
Query processing in publish/subscribe systems for textual data streams |
title_full |
Query processing in publish/subscribe systems for textual data streams |
title_fullStr |
Query processing in publish/subscribe systems for textual data streams |
title_full_unstemmed |
Query processing in publish/subscribe systems for textual data streams |
title_sort |
query processing in publish/subscribe systems for textual data streams |
publishDate |
2016 |
url |
http://hdl.handle.net/10356/66232 |
_version_ |
1759854857909436416 |