MoNet : deep motion exploitation for video object segmentation
In this paper, we propose a novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement. Concretely, MoNet exploits computed motion cue (i.e., optical flow) to reinforce the repre...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/143257 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-143257 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1432572020-08-17T05:05:17Z MoNet : deep motion exploitation for video object segmentation Xiao, Huaxin Feng, Jiashi Lin, Guosheng Liu, Yu Zhang, Maojun School of Computer Science and Engineering 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018 CVPR) Engineering::Computer science and engineering Motion Segmentation Feature Extraction In this paper, we propose a novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement. Concretely, MoNet exploits computed motion cue (i.e., optical flow) to reinforce the representation of the target frame by aligning and integrating representations from its neighbors. The new representation provides valuable temporal contexts for segmentation and improves robustness to various common contaminating factors, e.g., motion blur, appearance variation and deformation of video objects. Moreover, MoNet exploits motion inconsistency and transforms such motion cue into foreground/background prior to eliminate distraction from confusing instances and noisy regions. By introducing a distance transform layer, MoNet can effectively separate motion-inconstant instances/regions and thoroughly refine segmentation results. Integrating the proposed two motion exploitation components with a standard segmentation network, MoNet provides new state-of-the-art performance on three competitive benchmark datasets. Ministry of Education (MOE) Accepted version Huaxin Xiao was supported by the China Scholarship Council under Grant 201603170287. Jiashi Feng was partially supported by NUS startup R-263-000-C08-133, MOE Tier-I R-263-000-C21-112, NUS IDS R-263-000-C67-646 and ECRA R-263-000-C87-133. 2020-08-17T05:05:16Z 2020-08-17T05:05:16Z 2018 Conference Paper Xiao, H., Feng, J., Lin, G., Liu, Y. & Zhang, M. (2018). MoNet : deep motion exploitation for video object segmentation. Proceedings of the 2018 IEEE/CVF Conference o Computer Vision and Pattern Recognition (2018 CVPR). doi:10.1109/CVPR.2018.00125 978-1-5386-6421-6 https://hdl.handle.net/10356/143257 10.1109/CVPR.2018.00125 2-s2.0-85062869824 1140 1148 en © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/CVPR.2018.00125. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Motion Segmentation Feature Extraction |
spellingShingle |
Engineering::Computer science and engineering Motion Segmentation Feature Extraction Xiao, Huaxin Feng, Jiashi Lin, Guosheng Liu, Yu Zhang, Maojun MoNet : deep motion exploitation for video object segmentation |
description |
In this paper, we propose a novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement. Concretely, MoNet exploits computed motion cue (i.e., optical flow) to reinforce the representation of the target frame by aligning and integrating representations from its neighbors. The new representation provides valuable temporal contexts for segmentation and improves robustness to various common contaminating factors, e.g., motion blur, appearance variation and deformation of video objects. Moreover, MoNet exploits motion inconsistency and transforms such motion cue into foreground/background prior to eliminate distraction from confusing instances and noisy regions. By introducing a distance transform layer, MoNet can effectively separate motion-inconstant instances/regions and thoroughly refine segmentation results. Integrating the proposed two motion exploitation components with a standard segmentation network, MoNet provides new state-of-the-art performance on three competitive benchmark datasets. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Xiao, Huaxin Feng, Jiashi Lin, Guosheng Liu, Yu Zhang, Maojun |
format |
Conference or Workshop Item |
author |
Xiao, Huaxin Feng, Jiashi Lin, Guosheng Liu, Yu Zhang, Maojun |
author_sort |
Xiao, Huaxin |
title |
MoNet : deep motion exploitation for video object segmentation |
title_short |
MoNet : deep motion exploitation for video object segmentation |
title_full |
MoNet : deep motion exploitation for video object segmentation |
title_fullStr |
MoNet : deep motion exploitation for video object segmentation |
title_full_unstemmed |
MoNet : deep motion exploitation for video object segmentation |
title_sort |
monet : deep motion exploitation for video object segmentation |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/143257 |
_version_ |
1681058246678282240 |