Motion context network for weakly supervised object detection in videos
In weakly supervised object detection, most existing approaches are proposed for images. Without box-level annotations, these methods cannot accurately locate objects. Considering an object may show different motion from its surrounding objects or background, we leverage motion information to improv...
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/160496 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-160496 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1604962022-07-25T07:34:47Z Motion context network for weakly supervised object detection in videos Jin, Ruibing Lin, Guosheng Wen, Changyun Wang, Jianliang School of Electrical and Electronic Engineering School of Computer Science and Engineering Engineering::Electrical and electronic engineering Convolutional Neural Networks Deep Learning In weakly supervised object detection, most existing approaches are proposed for images. Without box-level annotations, these methods cannot accurately locate objects. Considering an object may show different motion from its surrounding objects or background, we leverage motion information to improve the detection accuracy. However, the motion pattern of an object is complex. Different parts of an object may have different motion patterns, which poses challenges in exploring motion information for object localization. Directly using motion information may degrade the localization performance. To overcome these issues, we propose a Motion Context Network (MC-Net) in this letter. Ourmethod generatesmotion context features by exploiting neighborhood motion correlation information on moving regions. These motion context features are then incorporated with image information to improve the detection accuracy. Furthermore, we propose a temporal aggregation module, which aggregates features across frames to enhance the feature representation at the current frame. Experiments are carried out on ImageNet VID, which shows that our MC-Net significantly improves the performance of the image based baseline method (37.4% mAP v.s. 29.8% mAP). Ministry of Education (MOE) Nanyang Technological University This work was supported in part by NTU Start-up Grant 04INS000338C130 and in part by the MOE Tier-1 Research Grant RG28/18 (S) and Grant RG22/19 (S). 2022-07-25T07:34:47Z 2022-07-25T07:34:47Z 2020 Journal Article Jin, R., Lin, G., Wen, C. & Wang, J. (2020). Motion context network for weakly supervised object detection in videos. IEEE Signal Processing Letters, 27, 1864-1868. https://dx.doi.org/10.1109/LSP.2020.3029958 1070-9908 https://hdl.handle.net/10356/160496 10.1109/LSP.2020.3029958 2-s2.0-85105579050 27 1864 1868 en RG28/18 (S) RG22/19 (S) 04INS000338C130 IEEE Signal Processing Letters © 2020 IEEE. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering Convolutional Neural Networks Deep Learning |
spellingShingle |
Engineering::Electrical and electronic engineering Convolutional Neural Networks Deep Learning Jin, Ruibing Lin, Guosheng Wen, Changyun Wang, Jianliang Motion context network for weakly supervised object detection in videos |
description |
In weakly supervised object detection, most existing approaches are proposed for images. Without box-level annotations, these methods cannot accurately locate objects. Considering an object may show different motion from its surrounding objects or background, we leverage motion information to improve the detection accuracy. However, the motion pattern of an object is complex. Different parts of an object may have different motion patterns, which poses challenges in exploring motion information for object localization. Directly using motion information may degrade the localization performance. To overcome these issues, we propose a Motion Context Network (MC-Net) in this letter. Ourmethod generatesmotion context features by exploiting neighborhood motion correlation information on moving regions. These motion context features are then incorporated with image information to improve the detection accuracy. Furthermore, we propose a temporal aggregation module, which aggregates features across frames to enhance the feature representation at the current frame. Experiments are carried out on ImageNet VID, which shows that our MC-Net significantly improves the performance of the image based baseline method (37.4% mAP v.s. 29.8% mAP). |
author2 |
School of Electrical and Electronic Engineering |
author_facet |
School of Electrical and Electronic Engineering Jin, Ruibing Lin, Guosheng Wen, Changyun Wang, Jianliang |
format |
Article |
author |
Jin, Ruibing Lin, Guosheng Wen, Changyun Wang, Jianliang |
author_sort |
Jin, Ruibing |
title |
Motion context network for weakly supervised object detection in videos |
title_short |
Motion context network for weakly supervised object detection in videos |
title_full |
Motion context network for weakly supervised object detection in videos |
title_fullStr |
Motion context network for weakly supervised object detection in videos |
title_full_unstemmed |
Motion context network for weakly supervised object detection in videos |
title_sort |
motion context network for weakly supervised object detection in videos |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/160496 |
_version_ |
1739837469757538304 |