Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval
Bag-of-visual-words (BoW) has recently become a popular representation to describe video and image content. Most existing approaches, nevertheless, neglect inter-word relatedness and measure similarity by bin-to-bin comparison of visual words in histograms. In this paper, we explore the linguistic a...
Saved in:
Main Authors: | , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2009
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/6372 https://ink.library.smu.edu.sg/context/sis_research/article/7375/viewcontent/10.1.1.439.3117.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-7375 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-73752021-11-23T02:47:22Z Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval JIANG, Yu-Gang NGO, Chong-wah Bag-of-visual-words (BoW) has recently become a popular representation to describe video and image content. Most existing approaches, nevertheless, neglect inter-word relatedness and measure similarity by bin-to-bin comparison of visual words in histograms. In this paper, we explore the linguistic and ontological aspects of visual words for video analysis. Two approaches, soft-weighting and constraint-based earth mover’s distance (CEMD), are proposed to model different aspects of visual word linguistics and proximity. In soft-weighting, visual words are cleverly weighted such that the linguistic meaning of words is taken into account for bin-to-bin histogram comparison. In CEMD, a cross-bin matching algorithm is formulated such that the ground distance measure considers the linguistic similarity of words. In particular, a BoW ontology which hierarchically specifies the hyponym relationship of words is constructed to assist the reasoning. We demonstrate soft-weighting and CEMD on two tasks: video semantic indexing and near-duplicate keyframe retrieval. Experimental results indicate that soft-weighting is superior to other popular weighting schemes such as term frequency (TF) weighting in large-scale video database. In addition, CEMD shows excellent performance compared to cosine similarity in near-duplicate retrieval. 2009-03-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6372 info:doi/10.1016/j.cviu.2008.10.002 https://ink.library.smu.edu.sg/context/sis_research/article/7375/viewcontent/10.1.1.439.3117.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University CEMD matching Linguistic similarity Near-duplicate keyframe Semantic concept Soft-weighting Visual ontology Data Storage Systems Graphics and Human Computer Interfaces |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
CEMD matching Linguistic similarity Near-duplicate keyframe Semantic concept Soft-weighting Visual ontology Data Storage Systems Graphics and Human Computer Interfaces |
spellingShingle |
CEMD matching Linguistic similarity Near-duplicate keyframe Semantic concept Soft-weighting Visual ontology Data Storage Systems Graphics and Human Computer Interfaces JIANG, Yu-Gang NGO, Chong-wah Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval |
description |
Bag-of-visual-words (BoW) has recently become a popular representation to describe video and image content. Most existing approaches, nevertheless, neglect inter-word relatedness and measure similarity by bin-to-bin comparison of visual words in histograms. In this paper, we explore the linguistic and ontological aspects of visual words for video analysis. Two approaches, soft-weighting and constraint-based earth mover’s distance (CEMD), are proposed to model different aspects of visual word linguistics and proximity. In soft-weighting, visual words are cleverly weighted such that the linguistic meaning of words is taken into account for bin-to-bin histogram comparison. In CEMD, a cross-bin matching algorithm is formulated such that the ground distance measure considers the linguistic similarity of words. In particular, a BoW ontology which hierarchically specifies the hyponym relationship of words is constructed to assist the reasoning. We demonstrate soft-weighting and CEMD on two tasks: video semantic indexing and near-duplicate keyframe retrieval. Experimental results indicate that soft-weighting is superior to other popular weighting schemes such as term frequency (TF) weighting in large-scale video database. In addition, CEMD shows excellent performance compared to cosine similarity in near-duplicate retrieval. |
format |
text |
author |
JIANG, Yu-Gang NGO, Chong-wah |
author_facet |
JIANG, Yu-Gang NGO, Chong-wah |
author_sort |
JIANG, Yu-Gang |
title |
Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval |
title_short |
Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval |
title_full |
Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval |
title_fullStr |
Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval |
title_full_unstemmed |
Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval |
title_sort |
visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2009 |
url |
https://ink.library.smu.edu.sg/sis_research/6372 https://ink.library.smu.edu.sg/context/sis_research/article/7375/viewcontent/10.1.1.439.3117.pdf |
_version_ |
1770575943830601728 |