Feature flow: in-network feature flow estimation for video object detection

Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-l...

Full description

Saved in:
Bibliographic Details
Main Authors: Jin, Ruibing, Lin, Guosheng, Wen, Changyun, Wang, Jianliang, Liu, Fayao
Other Authors: School of Electrical and Electronic Engineering
Format: Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/161421
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-161421
record_format dspace
spelling sg-ntu-dr.10356-1614212022-08-31T06:30:08Z Feature flow: in-network feature flow estimation for video object detection Jin, Ruibing Lin, Guosheng Wen, Changyun Wang, Jianliang Liu, Fayao School of Electrical and Electronic Engineering School of Computer Science and Engineering Engineering::Electrical and electronic engineering Engineering::Computer science and engineering Video Object Detection Feature Flow Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent with the pixel displacement, a common approach is to forward optical flow to a neural network and fine-tune this network on the task dataset. With this method, they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink about this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an In-network Feature Flow estimation module (IFF module) for video object detection. Without resorting to pre-training on any additional dataset, our IFF module is able to directly produce feature flow which indicates the feature displacement. Our IFF module consists of a shallow module, which shares the features with the detection branches. This compact design enables our IFF-Net to accurately detect objects, while maintaining a fast inference speed. Furthermore, we propose a transformation residual loss (TRL) based on self-supervision, which further improves the performance of our IFF-Net. Our IFF-Net outperforms existing methods and achieves new state-of-the-art performance on ImageNet VID. 2022-08-31T06:30:08Z 2022-08-31T06:30:08Z 2022 Journal Article Jin, R., Lin, G., Wen, C., Wang, J. & Liu, F. (2022). Feature flow: in-network feature flow estimation for video object detection. Pattern Recognition, 122, 108323-. https://dx.doi.org/10.1016/j.patcog.2021.108323 0031-3203 https://hdl.handle.net/10356/161421 10.1016/j.patcog.2021.108323 2-s2.0-85115428517 122 108323 en Pattern Recognition © 2021 Elsevier Ltd. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
Engineering::Computer science and engineering
Video Object Detection
Feature Flow
spellingShingle Engineering::Electrical and electronic engineering
Engineering::Computer science and engineering
Video Object Detection
Feature Flow
Jin, Ruibing
Lin, Guosheng
Wen, Changyun
Wang, Jianliang
Liu, Fayao
Feature flow: in-network feature flow estimation for video object detection
description Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent with the pixel displacement, a common approach is to forward optical flow to a neural network and fine-tune this network on the task dataset. With this method, they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink about this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an In-network Feature Flow estimation module (IFF module) for video object detection. Without resorting to pre-training on any additional dataset, our IFF module is able to directly produce feature flow which indicates the feature displacement. Our IFF module consists of a shallow module, which shares the features with the detection branches. This compact design enables our IFF-Net to accurately detect objects, while maintaining a fast inference speed. Furthermore, we propose a transformation residual loss (TRL) based on self-supervision, which further improves the performance of our IFF-Net. Our IFF-Net outperforms existing methods and achieves new state-of-the-art performance on ImageNet VID.
author2 School of Electrical and Electronic Engineering
author_facet School of Electrical and Electronic Engineering
Jin, Ruibing
Lin, Guosheng
Wen, Changyun
Wang, Jianliang
Liu, Fayao
format Article
author Jin, Ruibing
Lin, Guosheng
Wen, Changyun
Wang, Jianliang
Liu, Fayao
author_sort Jin, Ruibing
title Feature flow: in-network feature flow estimation for video object detection
title_short Feature flow: in-network feature flow estimation for video object detection
title_full Feature flow: in-network feature flow estimation for video object detection
title_fullStr Feature flow: in-network feature flow estimation for video object detection
title_full_unstemmed Feature flow: in-network feature flow estimation for video object detection
title_sort feature flow: in-network feature flow estimation for video object detection
publishDate 2022
url https://hdl.handle.net/10356/161421
_version_ 1743119563935973376