Practical elimination of near-duplicates from Web video search

Current web video search results rely exclusively on text keywords or user-supplied tags. A search on typical popular video often returns many duplicate and near-duplicate videos in the top results. This paper outlines ways to cluster and filter out the nearduplicate video using a hierarchical appro...

Full description

Saved in:

Bibliographic Details
Main Authors:	WU, Xiao, HAUPTMANN, Alexander G., NGO, Chong-wah
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2007
Subjects:	Copy selection Filtering; Multimodality Near-duplicates Novelty and redundancy detection Similarity measure Web video Data Storage Systems Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/6480 https://ink.library.smu.edu.sg/context/sis_research/article/7483/viewcontent/1291233.1291280.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Description
Summary:	Current web video search results rely exclusively on text keywords or user-supplied tags. A search on typical popular video often returns many duplicate and near-duplicate videos in the top results. This paper outlines ways to cluster and filter out the nearduplicate video using a hierarchical approach. Initial triage is performed using fast signatures derived from color histograms. Only when a video cannot be clearly classified as novel or nearduplicate using global signatures, we apply a more expensive local feature based near-duplicate detection which provides very accurate duplicate analysis through more costly computation. The results of 24 queries in a data set of 12,790 videos retrieved from Google, Yahoo! and YouTube show that this hierarchical approach can dramatically reduce redundant video displayed to the user in the top result set, at relatively small computational cost.

Practical elimination of near-duplicates from Web video search

Similar Items