Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval

Bag-of-visual-words (BoW) has recently become a popular representation to describe video and image content. Most existing approaches, nevertheless, neglect inter-word relatedness and measure similarity by bin-to-bin comparison of visual words in histograms. In this paper, we explore the linguistic a...

Full description

Saved in:

Bibliographic Details
Main Authors:	JIANG, Yu-Gang, NGO, Chong-wah
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2009
Subjects:	CEMD matching Linguistic similarity Near-duplicate keyframe Semantic concept Soft-weighting Visual ontology Data Storage Systems Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/6372 https://ink.library.smu.edu.sg/context/sis_research/article/7375/viewcontent/10.1.1.439.3117.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Description
Summary:	Bag-of-visual-words (BoW) has recently become a popular representation to describe video and image content. Most existing approaches, nevertheless, neglect inter-word relatedness and measure similarity by bin-to-bin comparison of visual words in histograms. In this paper, we explore the linguistic and ontological aspects of visual words for video analysis. Two approaches, soft-weighting and constraint-based earth mover’s distance (CEMD), are proposed to model different aspects of visual word linguistics and proximity. In soft-weighting, visual words are cleverly weighted such that the linguistic meaning of words is taken into account for bin-to-bin histogram comparison. In CEMD, a cross-bin matching algorithm is formulated such that the ground distance measure considers the linguistic similarity of words. In particular, a BoW ontology which hierarchically specifies the hyponym relationship of words is constructed to assist the reasoning. We demonstrate soft-weighting and CEMD on two tasks: video semantic indexing and near-duplicate keyframe retrieval. Experimental results indicate that soft-weighting is superior to other popular weighting schemes such as term frequency (TF) weighting in large-scale video database. In addition, CEMD shows excellent performance compared to cosine similarity in near-duplicate retrieval.

Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval

Similar Items