MoNet : deep motion exploitation for video object segmentation

In this paper, we propose a novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement. Concretely, MoNet exploits computed motion cue (i.e., optical flow) to reinforce the repre...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiao, Huaxin, Feng, Jiashi, Lin, Guosheng, Liu, Yu, Zhang, Maojun
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/143257
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-143257
record_format dspace
spelling sg-ntu-dr.10356-1432572020-08-17T05:05:17Z MoNet : deep motion exploitation for video object segmentation Xiao, Huaxin Feng, Jiashi Lin, Guosheng Liu, Yu Zhang, Maojun School of Computer Science and Engineering 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018 CVPR) Engineering::Computer science and engineering Motion Segmentation Feature Extraction In this paper, we propose a novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement. Concretely, MoNet exploits computed motion cue (i.e., optical flow) to reinforce the representation of the target frame by aligning and integrating representations from its neighbors. The new representation provides valuable temporal contexts for segmentation and improves robustness to various common contaminating factors, e.g., motion blur, appearance variation and deformation of video objects. Moreover, MoNet exploits motion inconsistency and transforms such motion cue into foreground/background prior to eliminate distraction from confusing instances and noisy regions. By introducing a distance transform layer, MoNet can effectively separate motion-inconstant instances/regions and thoroughly refine segmentation results. Integrating the proposed two motion exploitation components with a standard segmentation network, MoNet provides new state-of-the-art performance on three competitive benchmark datasets. Ministry of Education (MOE) Accepted version Huaxin Xiao was supported by the China Scholarship Council under Grant 201603170287. Jiashi Feng was partially supported by NUS startup R-263-000-C08-133, MOE Tier-I R-263-000-C21-112, NUS IDS R-263-000-C67-646 and ECRA R-263-000-C87-133. 2020-08-17T05:05:16Z 2020-08-17T05:05:16Z 2018 Conference Paper Xiao, H., Feng, J., Lin, G., Liu, Y. & Zhang, M. (2018). MoNet : deep motion exploitation for video object segmentation. Proceedings of the 2018 IEEE/CVF Conference o Computer Vision and Pattern Recognition (2018 CVPR). doi:10.1109/CVPR.2018.00125 978-1-5386-6421-6 https://hdl.handle.net/10356/143257 10.1109/CVPR.2018.00125 2-s2.0-85062869824 1140 1148 en © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/CVPR.2018.00125. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Motion Segmentation
Feature Extraction
spellingShingle Engineering::Computer science and engineering
Motion Segmentation
Feature Extraction
Xiao, Huaxin
Feng, Jiashi
Lin, Guosheng
Liu, Yu
Zhang, Maojun
MoNet : deep motion exploitation for video object segmentation
description In this paper, we propose a novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement. Concretely, MoNet exploits computed motion cue (i.e., optical flow) to reinforce the representation of the target frame by aligning and integrating representations from its neighbors. The new representation provides valuable temporal contexts for segmentation and improves robustness to various common contaminating factors, e.g., motion blur, appearance variation and deformation of video objects. Moreover, MoNet exploits motion inconsistency and transforms such motion cue into foreground/background prior to eliminate distraction from confusing instances and noisy regions. By introducing a distance transform layer, MoNet can effectively separate motion-inconstant instances/regions and thoroughly refine segmentation results. Integrating the proposed two motion exploitation components with a standard segmentation network, MoNet provides new state-of-the-art performance on three competitive benchmark datasets.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Xiao, Huaxin
Feng, Jiashi
Lin, Guosheng
Liu, Yu
Zhang, Maojun
format Conference or Workshop Item
author Xiao, Huaxin
Feng, Jiashi
Lin, Guosheng
Liu, Yu
Zhang, Maojun
author_sort Xiao, Huaxin
title MoNet : deep motion exploitation for video object segmentation
title_short MoNet : deep motion exploitation for video object segmentation
title_full MoNet : deep motion exploitation for video object segmentation
title_fullStr MoNet : deep motion exploitation for video object segmentation
title_full_unstemmed MoNet : deep motion exploitation for video object segmentation
title_sort monet : deep motion exploitation for video object segmentation
publishDate 2020
url https://hdl.handle.net/10356/143257
_version_ 1681058246678282240