MoNet : deep motion exploitation for video object segmentation

In this paper, we propose a novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement. Concretely, MoNet exploits computed motion cue (i.e., optical flow) to reinforce the repre...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xiao, Huaxin, Feng, Jiashi, Lin, Guosheng, Liu, Yu, Zhang, Maojun
Other Authors:	School of Computer Science and Engineering
Format:	Conference or Workshop Item
Language:	English
Published:	2020
Subjects:	Engineering::Computer science and engineering Motion Segmentation Feature Extraction
Online Access:	https://hdl.handle.net/10356/143257
Tags:	Add Tag No Tags, Be the first to tag this record!

id	sg-ntu-dr.10356-143257
record_format	dspace
spelling	sg-ntu-dr.10356-1432572020-08-17T05:05:17Z MoNet : deep motion exploitation for video object segmentation Xiao, Huaxin Feng, Jiashi Lin, Guosheng Liu, Yu Zhang, Maojun School of Computer Science and Engineering 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018 CVPR) Engineering::Computer science and engineering Motion Segmentation Feature Extraction In this paper, we propose a novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement. Concretely, MoNet exploits computed motion cue (i.e., optical flow) to reinforce the representation of the target frame by aligning and integrating representations from its neighbors. The new representation provides valuable temporal contexts for segmentation and improves robustness to various common contaminating factors, e.g., motion blur, appearance variation and deformation of video objects. Moreover, MoNet exploits motion inconsistency and transforms such motion cue into foreground/background prior to eliminate distraction from confusing instances and noisy regions. By introducing a distance transform layer, MoNet can effectively separate motion-inconstant instances/regions and thoroughly refine segmentation results. Integrating the proposed two motion exploitation components with a standard segmentation network, MoNet provides new state-of-the-art performance on three competitive benchmark datasets. Ministry of Education (MOE) Accepted version Huaxin Xiao was supported by the China Scholarship Council under Grant 201603170287. Jiashi Feng was partially supported by NUS startup R-263-000-C08-133, MOE Tier-I R-263-000-C21-112, NUS IDS R-263-000-C67-646 and ECRA R-263-000-C87-133. 2020-08-17T05:05:16Z 2020-08-17T05:05:16Z 2018 Conference Paper Xiao, H., Feng, J., Lin, G., Liu, Y. & Zhang, M. (2018). MoNet : deep motion exploitation for video object segmentation. Proceedings of the 2018 IEEE/CVF Conference o Computer Vision and Pattern Recognition (2018 CVPR). doi:10.1109/CVPR.2018.00125 978-1-5386-6421-6 https://hdl.handle.net/10356/143257 10.1109/CVPR.2018.00125 2-s2.0-85062869824 1140 1148 en © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/CVPR.2018.00125. application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Motion Segmentation Feature Extraction
spellingShingle	Engineering::Computer science and engineering Motion Segmentation Feature Extraction Xiao, Huaxin Feng, Jiashi Lin, Guosheng Liu, Yu Zhang, Maojun MoNet : deep motion exploitation for video object segmentation
description	In this paper, we propose a novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement. Concretely, MoNet exploits computed motion cue (i.e., optical flow) to reinforce the representation of the target frame by aligning and integrating representations from its neighbors. The new representation provides valuable temporal contexts for segmentation and improves robustness to various common contaminating factors, e.g., motion blur, appearance variation and deformation of video objects. Moreover, MoNet exploits motion inconsistency and transforms such motion cue into foreground/background prior to eliminate distraction from confusing instances and noisy regions. By introducing a distance transform layer, MoNet can effectively separate motion-inconstant instances/regions and thoroughly refine segmentation results. Integrating the proposed two motion exploitation components with a standard segmentation network, MoNet provides new state-of-the-art performance on three competitive benchmark datasets.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Xiao, Huaxin Feng, Jiashi Lin, Guosheng Liu, Yu Zhang, Maojun
format	Conference or Workshop Item
author	Xiao, Huaxin Feng, Jiashi Lin, Guosheng Liu, Yu Zhang, Maojun
author_sort	Xiao, Huaxin
title	MoNet : deep motion exploitation for video object segmentation
title_short	MoNet : deep motion exploitation for video object segmentation
title_full	MoNet : deep motion exploitation for video object segmentation
title_fullStr	MoNet : deep motion exploitation for video object segmentation
title_full_unstemmed	MoNet : deep motion exploitation for video object segmentation
title_sort	monet : deep motion exploitation for video object segmentation
publishDate	2020
url	https://hdl.handle.net/10356/143257
_version_	1681058246678282240

MoNet : deep motion exploitation for video object segmentation

Similar Items