Deep pixel-level matching via attention for video co-segmentation

In video object co-segmentation, methods based on patch-level matching are widely leveraged to extract the similarity between video frames. However, these methods can easily lead to pixel misclassification because they reduce the precision of pixel localization; thus, the accuracies of the segmentat...

Full description

Saved in:
Bibliographic Details
Main Authors: LI, Junliang, WONG, Hon-Cheng, HE, Shengfeng, LO, Sio-Long, ZHANG, Guifang, WANG, Wenxiao
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2020
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7852
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8855
record_format dspace
spelling sg-smu-ink.sis_research-88552023-06-15T09:00:05Z Deep pixel-level matching via attention for video co-segmentation LI, Junliang WONG, Hon-Cheng HE, Shengfeng LO, Sio-Long ZHANG, Guifang WANG, Wenxiao In video object co-segmentation, methods based on patch-level matching are widely leveraged to extract the similarity between video frames. However, these methods can easily lead to pixel misclassification because they reduce the precision of pixel localization; thus, the accuracies of the segmentation results of these methods are deducted. To address this problem, we propose a framework based on deep neural networks and equipped with a new attention module, which is designed for pixel-level matching to segment the object across video frames in this paper. In this attention module, the pixel-level matching step is able to compare the feature value of each pixel from one input frame with that of each pixel from another input frame for computing the similarity between two frames. Then a features fusion step is applied to efficiently fuse the feature maps of each frame with the similarity information for generating dense attention features. Finally, an up-sampling step refines the feature maps for obtaining high quality segmentation results by using these dense attention features. The ObMiC and DAVIS 2016 datasets were utilized to train and test our framework. Experimental results show that our framework achieves higher accuracy than those of other video segmentation methods that perform well in common information extraction. 2020-03-01T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/7852 info:doi/10.3390/app10061948 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University video co-segmentation pixel-level matching attention Information Security
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic video co-segmentation
pixel-level matching
attention
Information Security
spellingShingle video co-segmentation
pixel-level matching
attention
Information Security
LI, Junliang
WONG, Hon-Cheng
HE, Shengfeng
LO, Sio-Long
ZHANG, Guifang
WANG, Wenxiao
Deep pixel-level matching via attention for video co-segmentation
description In video object co-segmentation, methods based on patch-level matching are widely leveraged to extract the similarity between video frames. However, these methods can easily lead to pixel misclassification because they reduce the precision of pixel localization; thus, the accuracies of the segmentation results of these methods are deducted. To address this problem, we propose a framework based on deep neural networks and equipped with a new attention module, which is designed for pixel-level matching to segment the object across video frames in this paper. In this attention module, the pixel-level matching step is able to compare the feature value of each pixel from one input frame with that of each pixel from another input frame for computing the similarity between two frames. Then a features fusion step is applied to efficiently fuse the feature maps of each frame with the similarity information for generating dense attention features. Finally, an up-sampling step refines the feature maps for obtaining high quality segmentation results by using these dense attention features. The ObMiC and DAVIS 2016 datasets were utilized to train and test our framework. Experimental results show that our framework achieves higher accuracy than those of other video segmentation methods that perform well in common information extraction.
format text
author LI, Junliang
WONG, Hon-Cheng
HE, Shengfeng
LO, Sio-Long
ZHANG, Guifang
WANG, Wenxiao
author_facet LI, Junliang
WONG, Hon-Cheng
HE, Shengfeng
LO, Sio-Long
ZHANG, Guifang
WANG, Wenxiao
author_sort LI, Junliang
title Deep pixel-level matching via attention for video co-segmentation
title_short Deep pixel-level matching via attention for video co-segmentation
title_full Deep pixel-level matching via attention for video co-segmentation
title_fullStr Deep pixel-level matching via attention for video co-segmentation
title_full_unstemmed Deep pixel-level matching via attention for video co-segmentation
title_sort deep pixel-level matching via attention for video co-segmentation
publisher Institutional Knowledge at Singapore Management University
publishDate 2020
url https://ink.library.smu.edu.sg/sis_research/7852
_version_ 1770576556608978944