Self-supervised video hashing with hierarchical binary auto-encoder

Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel uns...

Full description

Saved in:
Bibliographic Details
Main Authors: Song, Jingkuan, Zhang, Hanwang, Li, Xiangpeng, Gao, Lianli, Wang, Meng, Hong, Richang
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/142308
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-142308
record_format dspace
spelling sg-ntu-dr.10356-1423082020-06-18T09:28:26Z Self-supervised video hashing with hierarchical binary auto-encoder Song, Jingkuan Zhang, Hanwang Li, Xiangpeng Gao, Lianli Wang, Meng Hong, Richang School of Computer Science and Engineering Engineering::Computer science and engineering Video Hashing Video Retrieval Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework dubbed self-supervised video hashing (SSVH), which is able to capture the temporal nature of videos in an end-to-end learning to hash fashion. We specifically address two central problems: 1) how to design an encoder-decoder architecture to generate binary codes for videos and 2) how to equip the binary codes with the ability of accurate video retrieval. We design a hierarchical binary auto-encoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked architecture. Then, we encourage the binary codes to simultaneously reconstruct the visual content and neighborhood structure of the videos. Experiments on two real-world data sets show that our SSVH method can significantly outperform the state-of-the-art methods and achieve the current best performance on the task of unsupervised video retrieval. 2020-06-18T09:28:26Z 2020-06-18T09:28:26Z 2018 Journal Article Song, J., Zhang, H., Li, X., Gao, L., Wang, M., & Hong, R. (2018). Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Transactions on Image Processing, 27(7), 3210-3221. doi:10.1109/TIP.2018.2814344 1057-7149 https://hdl.handle.net/10356/142308 10.1109/TIP.2018.2814344 29641401 2-s2.0-85043468776 7 27 3210 3221 en IEEE Transactions on Image Processing © 2018 IEEE. All rights reserved.
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Video Hashing
Video Retrieval
spellingShingle Engineering::Computer science and engineering
Video Hashing
Video Retrieval
Song, Jingkuan
Zhang, Hanwang
Li, Xiangpeng
Gao, Lianli
Wang, Meng
Hong, Richang
Self-supervised video hashing with hierarchical binary auto-encoder
description Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework dubbed self-supervised video hashing (SSVH), which is able to capture the temporal nature of videos in an end-to-end learning to hash fashion. We specifically address two central problems: 1) how to design an encoder-decoder architecture to generate binary codes for videos and 2) how to equip the binary codes with the ability of accurate video retrieval. We design a hierarchical binary auto-encoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked architecture. Then, we encourage the binary codes to simultaneously reconstruct the visual content and neighborhood structure of the videos. Experiments on two real-world data sets show that our SSVH method can significantly outperform the state-of-the-art methods and achieve the current best performance on the task of unsupervised video retrieval.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Song, Jingkuan
Zhang, Hanwang
Li, Xiangpeng
Gao, Lianli
Wang, Meng
Hong, Richang
format Article
author Song, Jingkuan
Zhang, Hanwang
Li, Xiangpeng
Gao, Lianli
Wang, Meng
Hong, Richang
author_sort Song, Jingkuan
title Self-supervised video hashing with hierarchical binary auto-encoder
title_short Self-supervised video hashing with hierarchical binary auto-encoder
title_full Self-supervised video hashing with hierarchical binary auto-encoder
title_fullStr Self-supervised video hashing with hierarchical binary auto-encoder
title_full_unstemmed Self-supervised video hashing with hierarchical binary auto-encoder
title_sort self-supervised video hashing with hierarchical binary auto-encoder
publishDate 2020
url https://hdl.handle.net/10356/142308
_version_ 1681058882377482240