Discovering newsworthy themes from sequenced data: A step towards computational journalism

Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel k -Sketch query that aims to find k striking streaks to best summarize a subject. Our scoring fun...

Full description

Saved in:
Bibliographic Details
Main Authors: FAN, Qi, LI, Yuchen, ZHANG, Dongxiang, TAN, Kian-Lee Tan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/3996
https://ink.library.smu.edu.sg/context/sis_research/article/4998/viewcontent/07883865__1_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-4998
record_format dspace
spelling sg-smu-ink.sis_research-49982018-05-28T08:57:09Z Discovering newsworthy themes from sequenced data: A step towards computational journalism FAN, Qi LI, Yuchen ZHANG, Dongxiang TAN, Kian-Lee Tan Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel k -Sketch query that aims to find k striking streaks to best summarize a subject. Our scoring function takes into account streak strikingness and streak coverage at the same time. We study the k -Sketch query processing in both offline and online scenarios, and propose various streak-level pruning techniques to find striking candidates. Among those candidates, we then develop approximate methods to discover the k most representative streaks with theoretical bounds. We conduct experiments on four real datasets, and the results demonstrate the efficiency and effectiveness of our proposed algorithms: the running time achieves up to 500 times speedup and the quality of the generated summaries is endorsed by the anonymous users from Amazon Mechanical Turk. 2017-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3996 info:doi/10.1109/TKDE.2017.2685587 https://ink.library.smu.edu.sg/context/sis_research/article/4998/viewcontent/07883865__1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Computational journalism news theme discovery sequenced data approximate algorithms Databases and Information Systems Data Storage Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Computational journalism
news theme discovery
sequenced data
approximate algorithms
Databases and Information Systems
Data Storage Systems
spellingShingle Computational journalism
news theme discovery
sequenced data
approximate algorithms
Databases and Information Systems
Data Storage Systems
FAN, Qi
LI, Yuchen
ZHANG, Dongxiang
TAN, Kian-Lee Tan
Discovering newsworthy themes from sequenced data: A step towards computational journalism
description Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel k -Sketch query that aims to find k striking streaks to best summarize a subject. Our scoring function takes into account streak strikingness and streak coverage at the same time. We study the k -Sketch query processing in both offline and online scenarios, and propose various streak-level pruning techniques to find striking candidates. Among those candidates, we then develop approximate methods to discover the k most representative streaks with theoretical bounds. We conduct experiments on four real datasets, and the results demonstrate the efficiency and effectiveness of our proposed algorithms: the running time achieves up to 500 times speedup and the quality of the generated summaries is endorsed by the anonymous users from Amazon Mechanical Turk.
format text
author FAN, Qi
LI, Yuchen
ZHANG, Dongxiang
TAN, Kian-Lee Tan
author_facet FAN, Qi
LI, Yuchen
ZHANG, Dongxiang
TAN, Kian-Lee Tan
author_sort FAN, Qi
title Discovering newsworthy themes from sequenced data: A step towards computational journalism
title_short Discovering newsworthy themes from sequenced data: A step towards computational journalism
title_full Discovering newsworthy themes from sequenced data: A step towards computational journalism
title_fullStr Discovering newsworthy themes from sequenced data: A step towards computational journalism
title_full_unstemmed Discovering newsworthy themes from sequenced data: A step towards computational journalism
title_sort discovering newsworthy themes from sequenced data: a step towards computational journalism
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/3996
https://ink.library.smu.edu.sg/context/sis_research/article/4998/viewcontent/07883865__1_.pdf
_version_ 1770574114917974016