Efficient unsupervised video hashing with contextual modeling and structural controlling

The most important effect of the video hashing technique is to support fast retrieval, which is benefiting from the high efficiency of binary calculation. Current video hash approaches are thus mainly targeted at learning compact binary codes to represent video content accurately. However, they may...

Full description

Saved in:

Bibliographic Details
Main Authors:	DUAN, Jingru, HAO, Yanbin, ZHU, Bin, CHENG, Lechao, ZHOU, Pengyuan, WANG, Xiang
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Codes Computational modeling Context modeling Data Structure Data structures Deep Neural Network Feature extraction Hash functions Large-scale retrieval Transformers Video hashing Graphics and Human Computer Interfaces Numerical Analysis and Scientific Computing
Online Access:	https://ink.library.smu.edu.sg/sis_research/8723 https://ink.library.smu.edu.sg/context/sis_research/article/9726/viewcontent/TMM_zhu24_av.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9726
record_format	dspace
spelling	sg-smu-ink.sis_research-97262024-04-18T07:36:04Z Efficient unsupervised video hashing with contextual modeling and structural controlling DUAN, Jingru HAO, Yanbin ZHU, Bin CHENG, Lechao ZHOU, Pengyuan WANG, Xiang The most important effect of the video hashing technique is to support fast retrieval, which is benefiting from the high efficiency of binary calculation. Current video hash approaches are thus mainly targeted at learning compact binary codes to represent video content accurately. However, they may overlook the generation efficiency for hash codes, i.e., designing lightweight neural networks. This paper proposes an method, which is not only for computing compact hash codes but also for designing a lightweight deep model. Specifically, we present an MLP-based model, where the video tensor is split into several groups and multiple axial contexts are explored to separately refine them in parallel. The axial contexts are referred to as the dynamics aggregated from different axial scales, including long/middle/short-range dependencies. The group operation significantly reduces the computational cost of the MLP backbone. Moreover, to achieve compact video hash codes, three structural losses are utilized. As demonstrated by the experiment, the three structures are highly complementary for approximating the real data structure. We conduct extensive experiments on three benchmark datasets for the unsupervised video hashing task and show the superior trade-off between performance and computational cost of our EUVH to the state of the arts. 2024-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8723 info:doi/10.1109/TMM.2024.3368924 https://ink.library.smu.edu.sg/context/sis_research/article/9726/viewcontent/TMM_zhu24_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Codes Computational modeling Context modeling Data Structure Data structures Deep Neural Network Feature extraction Hash functions Large-scale retrieval Transformers Video hashing Graphics and Human Computer Interfaces Numerical Analysis and Scientific Computing
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Codes Computational modeling Context modeling Data Structure Data structures Deep Neural Network Feature extraction Hash functions Large-scale retrieval Transformers Video hashing Graphics and Human Computer Interfaces Numerical Analysis and Scientific Computing
spellingShingle	Codes Computational modeling Context modeling Data Structure Data structures Deep Neural Network Feature extraction Hash functions Large-scale retrieval Transformers Video hashing Graphics and Human Computer Interfaces Numerical Analysis and Scientific Computing DUAN, Jingru HAO, Yanbin ZHU, Bin CHENG, Lechao ZHOU, Pengyuan WANG, Xiang Efficient unsupervised video hashing with contextual modeling and structural controlling
description	The most important effect of the video hashing technique is to support fast retrieval, which is benefiting from the high efficiency of binary calculation. Current video hash approaches are thus mainly targeted at learning compact binary codes to represent video content accurately. However, they may overlook the generation efficiency for hash codes, i.e., designing lightweight neural networks. This paper proposes an method, which is not only for computing compact hash codes but also for designing a lightweight deep model. Specifically, we present an MLP-based model, where the video tensor is split into several groups and multiple axial contexts are explored to separately refine them in parallel. The axial contexts are referred to as the dynamics aggregated from different axial scales, including long/middle/short-range dependencies. The group operation significantly reduces the computational cost of the MLP backbone. Moreover, to achieve compact video hash codes, three structural losses are utilized. As demonstrated by the experiment, the three structures are highly complementary for approximating the real data structure. We conduct extensive experiments on three benchmark datasets for the unsupervised video hashing task and show the superior trade-off between performance and computational cost of our EUVH to the state of the arts.
format	text
author	DUAN, Jingru HAO, Yanbin ZHU, Bin CHENG, Lechao ZHOU, Pengyuan WANG, Xiang
author_facet	DUAN, Jingru HAO, Yanbin ZHU, Bin CHENG, Lechao ZHOU, Pengyuan WANG, Xiang
author_sort	DUAN, Jingru
title	Efficient unsupervised video hashing with contextual modeling and structural controlling
title_short	Efficient unsupervised video hashing with contextual modeling and structural controlling
title_full	Efficient unsupervised video hashing with contextual modeling and structural controlling
title_fullStr	Efficient unsupervised video hashing with contextual modeling and structural controlling
title_full_unstemmed	Efficient unsupervised video hashing with contextual modeling and structural controlling
title_sort	efficient unsupervised video hashing with contextual modeling and structural controlling
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/8723 https://ink.library.smu.edu.sg/context/sis_research/article/9726/viewcontent/TMM_zhu24_av.pdf
_version_	1814047494349258752

Efficient unsupervised video hashing with contextual modeling and structural controlling

Similar Items