Feature flow: in-network feature flow estimation for video object detection
Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-l...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/161421 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-161421 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1614212022-08-31T06:30:08Z Feature flow: in-network feature flow estimation for video object detection Jin, Ruibing Lin, Guosheng Wen, Changyun Wang, Jianliang Liu, Fayao School of Electrical and Electronic Engineering School of Computer Science and Engineering Engineering::Electrical and electronic engineering Engineering::Computer science and engineering Video Object Detection Feature Flow Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent with the pixel displacement, a common approach is to forward optical flow to a neural network and fine-tune this network on the task dataset. With this method, they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink about this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an In-network Feature Flow estimation module (IFF module) for video object detection. Without resorting to pre-training on any additional dataset, our IFF module is able to directly produce feature flow which indicates the feature displacement. Our IFF module consists of a shallow module, which shares the features with the detection branches. This compact design enables our IFF-Net to accurately detect objects, while maintaining a fast inference speed. Furthermore, we propose a transformation residual loss (TRL) based on self-supervision, which further improves the performance of our IFF-Net. Our IFF-Net outperforms existing methods and achieves new state-of-the-art performance on ImageNet VID. 2022-08-31T06:30:08Z 2022-08-31T06:30:08Z 2022 Journal Article Jin, R., Lin, G., Wen, C., Wang, J. & Liu, F. (2022). Feature flow: in-network feature flow estimation for video object detection. Pattern Recognition, 122, 108323-. https://dx.doi.org/10.1016/j.patcog.2021.108323 0031-3203 https://hdl.handle.net/10356/161421 10.1016/j.patcog.2021.108323 2-s2.0-85115428517 122 108323 en Pattern Recognition © 2021 Elsevier Ltd. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering Engineering::Computer science and engineering Video Object Detection Feature Flow |
spellingShingle |
Engineering::Electrical and electronic engineering Engineering::Computer science and engineering Video Object Detection Feature Flow Jin, Ruibing Lin, Guosheng Wen, Changyun Wang, Jianliang Liu, Fayao Feature flow: in-network feature flow estimation for video object detection |
description |
Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent with the pixel displacement, a common approach is to forward optical flow to a neural network and fine-tune this network on the task dataset. With this method, they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink about this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an In-network Feature Flow estimation module (IFF module) for video object detection. Without resorting to pre-training on any additional dataset, our IFF module is able to directly produce feature flow which indicates the feature displacement. Our IFF module consists of a shallow module, which shares the features with the detection branches. This compact design enables our IFF-Net to accurately detect objects, while maintaining a fast inference speed. Furthermore, we propose a transformation residual loss (TRL) based on self-supervision, which further improves the performance of our IFF-Net. Our IFF-Net outperforms existing methods and achieves new state-of-the-art performance on ImageNet VID. |
author2 |
School of Electrical and Electronic Engineering |
author_facet |
School of Electrical and Electronic Engineering Jin, Ruibing Lin, Guosheng Wen, Changyun Wang, Jianliang Liu, Fayao |
format |
Article |
author |
Jin, Ruibing Lin, Guosheng Wen, Changyun Wang, Jianliang Liu, Fayao |
author_sort |
Jin, Ruibing |
title |
Feature flow: in-network feature flow estimation for video object detection |
title_short |
Feature flow: in-network feature flow estimation for video object detection |
title_full |
Feature flow: in-network feature flow estimation for video object detection |
title_fullStr |
Feature flow: in-network feature flow estimation for video object detection |
title_full_unstemmed |
Feature flow: in-network feature flow estimation for video object detection |
title_sort |
feature flow: in-network feature flow estimation for video object detection |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/161421 |
_version_ |
1743119563935973376 |