Motion context network for weakly supervised object detection in videos

In weakly supervised object detection, most existing approaches are proposed for images. Without box-level annotations, these methods cannot accurately locate objects. Considering an object may show different motion from its surrounding objects or background, we leverage motion information to improv...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Jin, Ruibing, Lin, Guosheng, Wen, Changyun, Wang, Jianliang
مؤلفون آخرون:	School of Electrical and Electronic Engineering
التنسيق:	مقال
اللغة:	English
منشور في:	2022
الموضوعات:	Engineering::Electrical and electronic engineering Convolutional Neural Networks Deep Learning
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/160496
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

الوصف
الملخص:	In weakly supervised object detection, most existing approaches are proposed for images. Without box-level annotations, these methods cannot accurately locate objects. Considering an object may show different motion from its surrounding objects or background, we leverage motion information to improve the detection accuracy. However, the motion pattern of an object is complex. Different parts of an object may have different motion patterns, which poses challenges in exploring motion information for object localization. Directly using motion information may degrade the localization performance. To overcome these issues, we propose a Motion Context Network (MC-Net) in this letter. Ourmethod generatesmotion context features by exploiting neighborhood motion correlation information on moving regions. These motion context features are then incorporated with image information to improve the detection accuracy. Furthermore, we propose a temporal aggregation module, which aggregates features across frames to enhance the feature representation at the current frame. Experiments are carried out on ImageNet VID, which shows that our MC-Net significantly improves the performance of the image based baseline method (37.4% mAP v.s. 29.8% mAP).

Motion context network for weakly supervised object detection in videos

مواد مشابهة