Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream

Huge amount of data with both space and text information, e.g., geo-tagged tweets, is flooding on the Internet. Such spatio-textual data stream contains valuable information for millions of users with various interests on different keywords and locations. Publish/subscribe systems enable efficient a...

Full description

Saved in:
Bibliographic Details
Main Authors: Chen, Zhida, Cong, Gao, Zhang, Zhenjie, Chen, Lisi, Fu, Tom Z. J.
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2017
Subjects:
Online Access:https://hdl.handle.net/10356/80746
http://hdl.handle.net/10220/42788
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-80746
record_format dspace
spelling sg-ntu-dr.10356-807462020-11-01T04:43:47Z Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream Chen, Zhida Cong, Gao Zhang, Zhenjie Chen, Lisi Fu, Tom Z. J. School of Computer Science and Engineering Proceedings of the 2017 IEEE 33rd International Conference on Data Engineering Rapid-Rich Object Search Lab Servers Distributed databases Huge amount of data with both space and text information, e.g., geo-tagged tweets, is flooding on the Internet. Such spatio-textual data stream contains valuable information for millions of users with various interests on different keywords and locations. Publish/subscribe systems enable efficient and effective information distribution by allowing users to register continuous queries with both spatial and textual constraints. However, the explosive growth of data scale and user base has posed challenges to the existing centralized publish/subscribe systems for spatiotextual data streams. In this paper, we propose our distributed publish/subscribe system, called PS2Stream, which digests a massive spatio-textual data stream and directs the stream to target users with registered interests. Compared with existing systems, PS2Stream achieves a better workload distribution in terms of both minimizing the total amount of workload and balancing the load of workers. To achieve this, we propose a new workload distribution algorithm considering both space and text properties of the data. Additionally, PS2Stream supports dynamic load adjustments to adapt to the change of the workload, which makes PS2Stream adaptive. Extensive empirical evaluation, on commercial cloud computing platform with real data, validates the superiority of our system design and advantages of our techniques on system performance improvement. NRF (Natl Research Foundation, S’pore) ASTAR (Agency for Sci., Tech. and Research, S’pore) MOE (Min. of Education, S’pore) Accepted version 2017-07-04T02:37:37Z 2019-12-06T13:58:03Z 2017-07-04T02:37:37Z 2019-12-06T13:58:03Z 2017 Conference Paper https://hdl.handle.net/10356/80746 http://hdl.handle.net/10220/42788 10.1109/ICDE.2017.154 en © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [https://dx.doi.org/10.1109/ICDE.2017.154]. 12 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Servers
Distributed databases
spellingShingle Servers
Distributed databases
Chen, Zhida
Cong, Gao
Zhang, Zhenjie
Chen, Lisi
Fu, Tom Z. J.
Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream
description Huge amount of data with both space and text information, e.g., geo-tagged tweets, is flooding on the Internet. Such spatio-textual data stream contains valuable information for millions of users with various interests on different keywords and locations. Publish/subscribe systems enable efficient and effective information distribution by allowing users to register continuous queries with both spatial and textual constraints. However, the explosive growth of data scale and user base has posed challenges to the existing centralized publish/subscribe systems for spatiotextual data streams. In this paper, we propose our distributed publish/subscribe system, called PS2Stream, which digests a massive spatio-textual data stream and directs the stream to target users with registered interests. Compared with existing systems, PS2Stream achieves a better workload distribution in terms of both minimizing the total amount of workload and balancing the load of workers. To achieve this, we propose a new workload distribution algorithm considering both space and text properties of the data. Additionally, PS2Stream supports dynamic load adjustments to adapt to the change of the workload, which makes PS2Stream adaptive. Extensive empirical evaluation, on commercial cloud computing platform with real data, validates the superiority of our system design and advantages of our techniques on system performance improvement.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Chen, Zhida
Cong, Gao
Zhang, Zhenjie
Chen, Lisi
Fu, Tom Z. J.
format Conference or Workshop Item
author Chen, Zhida
Cong, Gao
Zhang, Zhenjie
Chen, Lisi
Fu, Tom Z. J.
author_sort Chen, Zhida
title Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream
title_short Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream
title_full Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream
title_fullStr Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream
title_full_unstemmed Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream
title_sort distributed publish/subscribe query processing on the spatio-textual data stream
publishDate 2017
url https://hdl.handle.net/10356/80746
http://hdl.handle.net/10220/42788
_version_ 1683494103322460160