Local fusion networks with chained residual pooling for video action recognition

Action recognition is an important yet challenging problem. We here present a novel method, multistage local fusion networks with residual connections, to boost the performance of video action recognition. In realistic videos, an action instance may have a long time span and some frames may suffer f...

Full description

Saved in:
Bibliographic Details
Main Authors: He, Feixiang, Liu, Fayao, Yao, Rui, Lin, Guosheng
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/143069
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Action recognition is an important yet challenging problem. We here present a novel method, multistage local fusion networks with residual connections, to boost the performance of video action recognition. In realistic videos, an action instance may have a long time span and some frames may suffer from deteriorated object appearance due to motion blur or video defocus. Our method enhances the per-frame representation by capturing information from neighboring frames. We propose a local fusion block which considers neighboring frames to capture appearance and local motion information for generating per-frame representation. Our local fusion is performed in a multistage manner allowing feature fusion from varying neighborhood sizes in the temporal dimension. We employ residual connections in the fusion blocks to enable effective gradient propagation through the whole network allowing effective end-to-end training. We achieve competitive results on two challenging and public available datasets, namely HMDB51 and UCF101, which shows the effectiveness of the proposed method.