Learning spatio-temporal representation with local and global diffusion
Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for visual recognition problems. Nevertheless, the convolutional filters in these networks are local operations while ignoring the large-range dependency. Such drawback becomes even worse particularly for video reco...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2019
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/6458 https://ink.library.smu.edu.sg/context/sis_research/article/7461/viewcontent/Qiu_Learning_Spatio_Temporal_Representation_With_Local_and_Global_Diffusion_CVPR_2019_paper.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-7461 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-74612022-01-10T06:10:25Z Learning spatio-temporal representation with local and global diffusion QIU, Zhaofan YAO, Ting NGO, Chong-wah TIAN, Xinmei MEI, Tao Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for visual recognition problems. Nevertheless, the convolutional filters in these networks are local operations while ignoring the large-range dependency. Such drawback becomes even worse particularly for video recognition, since video is an information-intensive media with complex temporal variations. In this paper, we present a novel framework to boost the spatio-temporal representation learning by Local and Global Diffusion (LGD). Specifically, we construct a novel neural network architecture that learns the local and global representations in parallel. The architecture is composed of LGD blocks, where each block updates local and global features by modeling the diffusions between these two representations. Diffusions effectively interact two aspects of information, i.e., localized and holistic, for more powerful way of representation learning. Furthermore, a kernelized classifier is introduced to combine the representations from two aspects for video recognition. Our LGD networks achieve clear improvements on the large-scale Kinetics-400 and Kinetics-600 video classification datasets against the best competitors by 3.5% and 0.7%. We further examine the generalization of both the global and local representations produced by our pre-trained LGD networks on four different benchmarks for video action recognition and spatio-temporal action detection tasks. Superior performances over several state-of-the-art techniques on these benchmarks are reported. 2019-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6458 info:doi/10.1109/CVPR.2019.01233 https://ink.library.smu.edu.sg/context/sis_research/article/7461/viewcontent/Qiu_Learning_Spatio_Temporal_Representation_With_Local_and_Global_Diffusion_CVPR_2019_paper.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Video Analytics Graphics and Human Computer Interfaces OS and Networks |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Video Analytics Graphics and Human Computer Interfaces OS and Networks |
spellingShingle |
Video Analytics Graphics and Human Computer Interfaces OS and Networks QIU, Zhaofan YAO, Ting NGO, Chong-wah TIAN, Xinmei MEI, Tao Learning spatio-temporal representation with local and global diffusion |
description |
Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for visual recognition problems. Nevertheless, the convolutional filters in these networks are local operations while ignoring the large-range dependency. Such drawback becomes even worse particularly for video recognition, since video is an information-intensive media with complex temporal variations. In this paper, we present a novel framework to boost the spatio-temporal representation learning by Local and Global Diffusion (LGD). Specifically, we construct a novel neural network architecture that learns the local and global representations in parallel. The architecture is composed of LGD blocks, where each block updates local and global features by modeling the diffusions between these two representations. Diffusions effectively interact two aspects of information, i.e., localized and holistic, for more powerful way of representation learning. Furthermore, a kernelized classifier is introduced to combine the representations from two aspects for video recognition. Our LGD networks achieve clear improvements on the large-scale Kinetics-400 and Kinetics-600 video classification datasets against the best competitors by 3.5% and 0.7%. We further examine the generalization of both the global and local representations produced by our pre-trained LGD networks on four different benchmarks for video action recognition and spatio-temporal action detection tasks. Superior performances over several state-of-the-art techniques on these benchmarks are reported. |
format |
text |
author |
QIU, Zhaofan YAO, Ting NGO, Chong-wah TIAN, Xinmei MEI, Tao |
author_facet |
QIU, Zhaofan YAO, Ting NGO, Chong-wah TIAN, Xinmei MEI, Tao |
author_sort |
QIU, Zhaofan |
title |
Learning spatio-temporal representation with local and global diffusion |
title_short |
Learning spatio-temporal representation with local and global diffusion |
title_full |
Learning spatio-temporal representation with local and global diffusion |
title_fullStr |
Learning spatio-temporal representation with local and global diffusion |
title_full_unstemmed |
Learning spatio-temporal representation with local and global diffusion |
title_sort |
learning spatio-temporal representation with local and global diffusion |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2019 |
url |
https://ink.library.smu.edu.sg/sis_research/6458 https://ink.library.smu.edu.sg/context/sis_research/article/7461/viewcontent/Qiu_Learning_Spatio_Temporal_Representation_With_Local_and_Global_Diffusion_CVPR_2019_paper.pdf |
_version_ |
1770575964241133568 |