Processing long queries against short text: Top-k advertisement matching in news stream applications

Many real applications in real-time news stream advertising call for efficient processing of long queriesagainst short text. In such applications, dynamic news feeds are regarded as queries to match against anadvertisement (ad) database for retrieving the k most relevant ads. The existing approaches...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHANG, Dongxiang, LI, Yuchen, FAN, Ju, GAO, Lianli, SHEN, Fumin, SHEN, Heng Tao
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/3995
https://ink.library.smu.edu.sg/context/sis_research/article/4997/viewcontent/a28_zhang__1_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-4997
record_format dspace
spelling sg-smu-ink.sis_research-49972019-02-07T09:17:55Z Processing long queries against short text: Top-k advertisement matching in news stream applications ZHANG, Dongxiang LI, Yuchen FAN, Ju GAO, Lianli SHEN, Fumin SHEN, Heng Tao Many real applications in real-time news stream advertising call for efficient processing of long queriesagainst short text. In such applications, dynamic news feeds are regarded as queries to match against anadvertisement (ad) database for retrieving the k most relevant ads. The existing approaches to keywordretrieval cannot work well in this search scenario when queries are triggered at a very high frequency.To address the problem, we introduce new techniques to significantly improve search performance. First,we devise a two-level partitioning for tight upper bound estimation and a lazy evaluation scheme to delayfull evaluation of unpromising candidates, which can bring three to four times performance boosting in adatabase with 7 million ads. Second, we propose a novel rank-aware block-oriented inverted index to furtherimprove performance. In this index scheme, each entry in an inverted list is assigned a rank according toits importance in the ad. Then, we introduce a block-at-a-time search strategy based on the index scheme tosupport a much tighter upper bound estimation and a very early termination. We have conducted experimentswith real datasets, and the results show that the rank-aware method can further improve performance byan order of magnitude. 2017-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3995 info:doi/10.1145/3052772 https://ink.library.smu.edu.sg/context/sis_research/article/4997/viewcontent/a28_zhang__1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Long queries short text top-k retrieval inverted index rank-aware partitioning Databases and Information Systems Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Long queries
short text
top-k retrieval
inverted index
rank-aware partitioning
Databases and Information Systems
Software Engineering
spellingShingle Long queries
short text
top-k retrieval
inverted index
rank-aware partitioning
Databases and Information Systems
Software Engineering
ZHANG, Dongxiang
LI, Yuchen
FAN, Ju
GAO, Lianli
SHEN, Fumin
SHEN, Heng Tao
Processing long queries against short text: Top-k advertisement matching in news stream applications
description Many real applications in real-time news stream advertising call for efficient processing of long queriesagainst short text. In such applications, dynamic news feeds are regarded as queries to match against anadvertisement (ad) database for retrieving the k most relevant ads. The existing approaches to keywordretrieval cannot work well in this search scenario when queries are triggered at a very high frequency.To address the problem, we introduce new techniques to significantly improve search performance. First,we devise a two-level partitioning for tight upper bound estimation and a lazy evaluation scheme to delayfull evaluation of unpromising candidates, which can bring three to four times performance boosting in adatabase with 7 million ads. Second, we propose a novel rank-aware block-oriented inverted index to furtherimprove performance. In this index scheme, each entry in an inverted list is assigned a rank according toits importance in the ad. Then, we introduce a block-at-a-time search strategy based on the index scheme tosupport a much tighter upper bound estimation and a very early termination. We have conducted experimentswith real datasets, and the results show that the rank-aware method can further improve performance byan order of magnitude.
format text
author ZHANG, Dongxiang
LI, Yuchen
FAN, Ju
GAO, Lianli
SHEN, Fumin
SHEN, Heng Tao
author_facet ZHANG, Dongxiang
LI, Yuchen
FAN, Ju
GAO, Lianli
SHEN, Fumin
SHEN, Heng Tao
author_sort ZHANG, Dongxiang
title Processing long queries against short text: Top-k advertisement matching in news stream applications
title_short Processing long queries against short text: Top-k advertisement matching in news stream applications
title_full Processing long queries against short text: Top-k advertisement matching in news stream applications
title_fullStr Processing long queries against short text: Top-k advertisement matching in news stream applications
title_full_unstemmed Processing long queries against short text: Top-k advertisement matching in news stream applications
title_sort processing long queries against short text: top-k advertisement matching in news stream applications
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/3995
https://ink.library.smu.edu.sg/context/sis_research/article/4997/viewcontent/a28_zhang__1_.pdf
_version_ 1770574114499592192