Self-supervised video hashing with hierarchical binary auto-encoder
Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel uns...
Saved in:
Main Authors: | , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/142308 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-142308 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1423082020-06-18T09:28:26Z Self-supervised video hashing with hierarchical binary auto-encoder Song, Jingkuan Zhang, Hanwang Li, Xiangpeng Gao, Lianli Wang, Meng Hong, Richang School of Computer Science and Engineering Engineering::Computer science and engineering Video Hashing Video Retrieval Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework dubbed self-supervised video hashing (SSVH), which is able to capture the temporal nature of videos in an end-to-end learning to hash fashion. We specifically address two central problems: 1) how to design an encoder-decoder architecture to generate binary codes for videos and 2) how to equip the binary codes with the ability of accurate video retrieval. We design a hierarchical binary auto-encoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked architecture. Then, we encourage the binary codes to simultaneously reconstruct the visual content and neighborhood structure of the videos. Experiments on two real-world data sets show that our SSVH method can significantly outperform the state-of-the-art methods and achieve the current best performance on the task of unsupervised video retrieval. 2020-06-18T09:28:26Z 2020-06-18T09:28:26Z 2018 Journal Article Song, J., Zhang, H., Li, X., Gao, L., Wang, M., & Hong, R. (2018). Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Transactions on Image Processing, 27(7), 3210-3221. doi:10.1109/TIP.2018.2814344 1057-7149 https://hdl.handle.net/10356/142308 10.1109/TIP.2018.2814344 29641401 2-s2.0-85043468776 7 27 3210 3221 en IEEE Transactions on Image Processing © 2018 IEEE. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Video Hashing Video Retrieval |
spellingShingle |
Engineering::Computer science and engineering Video Hashing Video Retrieval Song, Jingkuan Zhang, Hanwang Li, Xiangpeng Gao, Lianli Wang, Meng Hong, Richang Self-supervised video hashing with hierarchical binary auto-encoder |
description |
Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework dubbed self-supervised video hashing (SSVH), which is able to capture the temporal nature of videos in an end-to-end learning to hash fashion. We specifically address two central problems: 1) how to design an encoder-decoder architecture to generate binary codes for videos and 2) how to equip the binary codes with the ability of accurate video retrieval. We design a hierarchical binary auto-encoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked architecture. Then, we encourage the binary codes to simultaneously reconstruct the visual content and neighborhood structure of the videos. Experiments on two real-world data sets show that our SSVH method can significantly outperform the state-of-the-art methods and achieve the current best performance on the task of unsupervised video retrieval. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Song, Jingkuan Zhang, Hanwang Li, Xiangpeng Gao, Lianli Wang, Meng Hong, Richang |
format |
Article |
author |
Song, Jingkuan Zhang, Hanwang Li, Xiangpeng Gao, Lianli Wang, Meng Hong, Richang |
author_sort |
Song, Jingkuan |
title |
Self-supervised video hashing with hierarchical binary auto-encoder |
title_short |
Self-supervised video hashing with hierarchical binary auto-encoder |
title_full |
Self-supervised video hashing with hierarchical binary auto-encoder |
title_fullStr |
Self-supervised video hashing with hierarchical binary auto-encoder |
title_full_unstemmed |
Self-supervised video hashing with hierarchical binary auto-encoder |
title_sort |
self-supervised video hashing with hierarchical binary auto-encoder |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/142308 |
_version_ |
1681058882377482240 |