Multimodal news story clustering with pairwise visual near-duplicate constraint

Story clustering is a critical step for news retrieval, topic mining, and summarization. Nonetheless, the task remains highly challenging owing to the fact that news topics exhibit clusters of varying densities, shapes, and sizes. Traditional algorithms are found to be ineffective in mining these ty...

Full description

Saved in:
Bibliographic Details
Main Authors: WU, Xiao, NGO, Chong-wah, HAUPTMANN, Alexander G.
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2008
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/6333
https://ink.library.smu.edu.sg/context/sis_research/article/7336/viewcontent/10.1.1.323.9737.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-7336
record_format dspace
spelling sg-smu-ink.sis_research-73362021-11-23T04:55:13Z Multimodal news story clustering with pairwise visual near-duplicate constraint WU, Xiao NGO, Chong-wah HAUPTMANN, Alexander G. Story clustering is a critical step for news retrieval, topic mining, and summarization. Nonetheless, the task remains highly challenging owing to the fact that news topics exhibit clusters of varying densities, shapes, and sizes. Traditional algorithms are found to be ineffective in mining these types of clusters. This paper offers a new perspective by exploring the pairwise visual cues deriving from near-duplicate keyframes (NDK) for constraint-based clustering. We propose a constraint-driven co-clustering algorithm (CCC), which utilizes the near-duplicate constraints built on top of text, to mine topic-related stories and the outliers. With CCC, the duality between stories and their underlying multimodal features is exploited to transform features in low-dimensional space with normalized cut. The visual constraints are added directly to this new space, while the traditional DBSCAN is revisited to capitalize on the availability of constraints and the reduced dimensional space. We modify DBSCAN with two new characteristics for story clustering: 1) constraint-based centroid selection and 2) adaptive radius. Experiments on TRECVID-2004 corpus demonstrate that CCC with visual constraints is more capable of mining news topics of varying densities, shapes and sizes, compared with traditional k-means, DBSCAN, and spectral co-clustering algorithms. 2008-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6333 info:doi/10.1109/TMM.2007.911778 https://ink.library.smu.edu.sg/context/sis_research/article/7336/viewcontent/10.1.1.323.9737.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University multimedia topic detection and tracking near-duplicate visual constraint news story clustering video data mining Data Storage Systems Graphics and Human Computer Interfaces
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic multimedia topic detection and tracking
near-duplicate visual constraint
news story clustering
video data mining
Data Storage Systems
Graphics and Human Computer Interfaces
spellingShingle multimedia topic detection and tracking
near-duplicate visual constraint
news story clustering
video data mining
Data Storage Systems
Graphics and Human Computer Interfaces
WU, Xiao
NGO, Chong-wah
HAUPTMANN, Alexander G.
Multimodal news story clustering with pairwise visual near-duplicate constraint
description Story clustering is a critical step for news retrieval, topic mining, and summarization. Nonetheless, the task remains highly challenging owing to the fact that news topics exhibit clusters of varying densities, shapes, and sizes. Traditional algorithms are found to be ineffective in mining these types of clusters. This paper offers a new perspective by exploring the pairwise visual cues deriving from near-duplicate keyframes (NDK) for constraint-based clustering. We propose a constraint-driven co-clustering algorithm (CCC), which utilizes the near-duplicate constraints built on top of text, to mine topic-related stories and the outliers. With CCC, the duality between stories and their underlying multimodal features is exploited to transform features in low-dimensional space with normalized cut. The visual constraints are added directly to this new space, while the traditional DBSCAN is revisited to capitalize on the availability of constraints and the reduced dimensional space. We modify DBSCAN with two new characteristics for story clustering: 1) constraint-based centroid selection and 2) adaptive radius. Experiments on TRECVID-2004 corpus demonstrate that CCC with visual constraints is more capable of mining news topics of varying densities, shapes and sizes, compared with traditional k-means, DBSCAN, and spectral co-clustering algorithms.
format text
author WU, Xiao
NGO, Chong-wah
HAUPTMANN, Alexander G.
author_facet WU, Xiao
NGO, Chong-wah
HAUPTMANN, Alexander G.
author_sort WU, Xiao
title Multimodal news story clustering with pairwise visual near-duplicate constraint
title_short Multimodal news story clustering with pairwise visual near-duplicate constraint
title_full Multimodal news story clustering with pairwise visual near-duplicate constraint
title_fullStr Multimodal news story clustering with pairwise visual near-duplicate constraint
title_full_unstemmed Multimodal news story clustering with pairwise visual near-duplicate constraint
title_sort multimodal news story clustering with pairwise visual near-duplicate constraint
publisher Institutional Knowledge at Singapore Management University
publishDate 2008
url https://ink.library.smu.edu.sg/sis_research/6333
https://ink.library.smu.edu.sg/context/sis_research/article/7336/viewcontent/10.1.1.323.9737.pdf
_version_ 1770575936238911488