Coherent bag-of audio words model for efficient large-scale video copy detection

Current content-based video copy detection approaches mostly concentrate on the visual cues and neglect the audio information. In this paper, we attempt to tackle the video copy detection task resorting to audio information, which is equivalently important as well as visual information in multimedia...

Full description

Saved in:

Bibliographic Details
Main Authors:	LIU, Yang, ZHAO, Wan-Lei, NGO, Chong-wah, XU, Chang-Sheng, LU, Han-Qing
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2010
Subjects:	Audio words Coherency vocabulary Copy detection Data Storage Systems Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/6522 https://ink.library.smu.edu.sg/context/sis_research/article/7525/viewcontent/1816041.1816057.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7525
record_format	dspace
spelling	sg-smu-ink.sis_research-75252022-01-10T03:52:57Z Coherent bag-of audio words model for efficient large-scale video copy detection LIU, Yang ZHAO, Wan-Lei NGO, Chong-wah XU, Chang-Sheng LU, Han-Qing Current content-based video copy detection approaches mostly concentrate on the visual cues and neglect the audio information. In this paper, we attempt to tackle the video copy detection task resorting to audio information, which is equivalently important as well as visual information in multimedia processing. Firstly, inspired by bag-of visual words model, a bag-of audio words (BoA) representation is proposed to characterize each audio frame. Different from naive singlebased modeling audio retrieval approaches, BoA is a highlevel model due to its perceptual and semantical property. Within the BoA model, a coherency vocabulary indexing structure is adopted to achieve more efficient and effective indexing than single vocabulary of standard BoW model. The coherency vocabulary takes advantage of multiple audio features by computing co-occurrence of them across different feature spaces. By enforcing the tight coherency constraint across feature spaces, coherency vocabulary makes the BoA model more discriminative and robust to various audio transforms. 2D Hough transform is then applied to aggregate scores from matched audio segments. The segements fall into the peak bin is identified as the copy segments in reference video. In addition, we also accomplish video copy detection from both audio and visual cues by performing four late fusion strategies to demonstrate complementarity of audio and visual information in video copy detection. Intensive experiments are conducted on the large-scale dataset of TRECVID 2009 and competitve results are achieved. 2010-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6522 info:doi/10.1145/1816041.1816057 https://ink.library.smu.edu.sg/context/sis_research/article/7525/viewcontent/1816041.1816057.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Audio words Coherency vocabulary Copy detection Data Storage Systems Graphics and Human Computer Interfaces
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Audio words Coherency vocabulary Copy detection Data Storage Systems Graphics and Human Computer Interfaces
spellingShingle	Audio words Coherency vocabulary Copy detection Data Storage Systems Graphics and Human Computer Interfaces LIU, Yang ZHAO, Wan-Lei NGO, Chong-wah XU, Chang-Sheng LU, Han-Qing Coherent bag-of audio words model for efficient large-scale video copy detection
description	Current content-based video copy detection approaches mostly concentrate on the visual cues and neglect the audio information. In this paper, we attempt to tackle the video copy detection task resorting to audio information, which is equivalently important as well as visual information in multimedia processing. Firstly, inspired by bag-of visual words model, a bag-of audio words (BoA) representation is proposed to characterize each audio frame. Different from naive singlebased modeling audio retrieval approaches, BoA is a highlevel model due to its perceptual and semantical property. Within the BoA model, a coherency vocabulary indexing structure is adopted to achieve more efficient and effective indexing than single vocabulary of standard BoW model. The coherency vocabulary takes advantage of multiple audio features by computing co-occurrence of them across different feature spaces. By enforcing the tight coherency constraint across feature spaces, coherency vocabulary makes the BoA model more discriminative and robust to various audio transforms. 2D Hough transform is then applied to aggregate scores from matched audio segments. The segements fall into the peak bin is identified as the copy segments in reference video. In addition, we also accomplish video copy detection from both audio and visual cues by performing four late fusion strategies to demonstrate complementarity of audio and visual information in video copy detection. Intensive experiments are conducted on the large-scale dataset of TRECVID 2009 and competitve results are achieved.
format	text
author	LIU, Yang ZHAO, Wan-Lei NGO, Chong-wah XU, Chang-Sheng LU, Han-Qing
author_facet	LIU, Yang ZHAO, Wan-Lei NGO, Chong-wah XU, Chang-Sheng LU, Han-Qing
author_sort	LIU, Yang
title	Coherent bag-of audio words model for efficient large-scale video copy detection
title_short	Coherent bag-of audio words model for efficient large-scale video copy detection
title_full	Coherent bag-of audio words model for efficient large-scale video copy detection
title_fullStr	Coherent bag-of audio words model for efficient large-scale video copy detection
title_full_unstemmed	Coherent bag-of audio words model for efficient large-scale video copy detection
title_sort	coherent bag-of audio words model for efficient large-scale video copy detection
publisher	Institutional Knowledge at Singapore Management University
publishDate	2010
url	https://ink.library.smu.edu.sg/sis_research/6522 https://ink.library.smu.edu.sg/context/sis_research/article/7525/viewcontent/1816041.1816057.pdf
_version_	1770575981038272512

Coherent bag-of audio words model for efficient large-scale video copy detection

Similar Items