Sampling-based image and video matting without compositing equation
Image and video matting play a fundamental role in image and video editing applications. They are generally classified into α-propagation based approaches and color sampling based approaches. The correlation between neighboring pixels with respect to local image statistics is leveraged to interpo...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/70341 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-70341 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-703412023-03-04T00:49:04Z Sampling-based image and video matting without compositing equation Johnson, Jubin Deepu Rajan School of Computer Science and Engineering Centre for Multimedia and Network Technology DRNTU::Engineering::Computer science and engineering Image and video matting play a fundamental role in image and video editing applications. They are generally classified into α-propagation based approaches and color sampling based approaches. The correlation between neighboring pixels with respect to local image statistics is leveraged to interpolate the known alpha values to the unknown regions in α-propagation methods. In color sampling methods, foreground (F ) and background (B) samples from known regions that represent the true colors of the unknown pixels are used to estimate alpha. Complex color distributions of foreground and background regions, highly textured edges, and unavailability of true F and B samples are some of the main challenges faced by current works. In addition to this, sampling methods have traditionally followed the compositing equation using (F, B) pairs for alpha estimation. When extended to videos, the unavailability of user-defined trimaps in each frame and the additional requirement of temporal coherency across the sequence makes the matte extraction process a highly challenging task. We aim to develop novel natural matting algorithms for both images and video that can alleviate the drawbacks faced by current methods in generating a good quality matte. We achieve the objectives through the following contributions. First, a sampling-based image matting algorithm is proposed that utilizes sparse coding in the image domain to extract an alpha matte. Multiple F and B samples, as opposed to a single (F, B) pair is used to describe the color at a blended pixel. A carefully chosen dictionary made up of feature vectors from the F and B regions, refined through a foreground probability map ensures that the constrained sparse code coefficients can be approximated to the alpha value. Experimental evaluations on a public benchmark database show that our method achieves state-of-the-art results. Second, a new video matting algorithm is proposed which uses a multi-frame graph- ical model to ensure temporal coherency in the extracted matte. The alpha value at a pixel needs to be consistent and smooth across the video sequence for better tempo- ral coherence. This is accomplished by simultaneously solving for the alpha mattes for multiple consecutive frames. An objective function is proposed that can be solved in closed-form as a sparse linear system. An adaptive temporal trimap propagation using motion-assisted shape blending is utilized to propagate the trimaps automatically be- tween the key-frames. Experimental evaluations on an exclusive video matting dataset validates the effectiveness of the method. Third, a new sampling-based video matting algorithm is proposed that reinterprets the matting problem from the perspective of sparse reconstruction error of F and B samples. Sampling methods generally select an (F, B) pair that produces the least recon- struction error. The significance of the error has been left unexamined. Two patch-based frameworks are used to ensure temporal coherency in the video mattes - a multi-frame non-local means framework using coherency sensitive hashing and a patch-based multi- frame graph model using motion. Qualitative and quantitative evaluations indicate the performance of the method in reducing temporal jitter and maintaining spatial accuracy in the video mattes. Doctor of Philosophy (SCE) 2017-04-20T08:28:04Z 2017-04-20T08:28:04Z 2017 Thesis Johnson, J. (2017). Sampling-based image and video matting without compositing equation. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/70341 10.32657/10356/70341 en 143 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering |
spellingShingle |
DRNTU::Engineering::Computer science and engineering Johnson, Jubin Sampling-based image and video matting without compositing equation |
description |
Image and video matting play a fundamental role in image and video editing applications.
They are generally classified into α-propagation based approaches and color sampling
based approaches. The correlation between neighboring pixels with respect to local image
statistics is leveraged to interpolate the known alpha values to the unknown regions in
α-propagation methods. In color sampling methods, foreground (F ) and background
(B) samples from known regions that represent the true colors of the unknown pixels
are used to estimate alpha. Complex color distributions of foreground and background
regions, highly textured edges, and unavailability of true F and B samples are some of
the main challenges faced by current works. In addition to this, sampling methods have
traditionally followed the compositing equation using (F, B) pairs for alpha estimation.
When extended to videos, the unavailability of user-defined trimaps in each frame and
the additional requirement of temporal coherency across the sequence makes the matte
extraction process a highly challenging task. We aim to develop novel natural matting
algorithms for both images and video that can alleviate the drawbacks faced by current
methods in generating a good quality matte. We achieve the objectives through the
following contributions.
First, a sampling-based image matting algorithm is proposed that utilizes sparse
coding in the image domain to extract an alpha matte. Multiple F and B samples, as
opposed to a single (F, B) pair is used to describe the color at a blended pixel. A carefully
chosen dictionary made up of feature vectors from the F and B regions, refined through
a foreground probability map ensures that the constrained sparse code coefficients can
be approximated to the alpha value. Experimental evaluations on a public benchmark
database show that our method achieves state-of-the-art results.
Second, a new video matting algorithm is proposed which uses a multi-frame graph-
ical model to ensure temporal coherency in the extracted matte. The alpha value at a pixel needs to be consistent and smooth across the video sequence for better tempo-
ral coherence. This is accomplished by simultaneously solving for the alpha mattes for
multiple consecutive frames. An objective function is proposed that can be solved in
closed-form as a sparse linear system. An adaptive temporal trimap propagation using
motion-assisted shape blending is utilized to propagate the trimaps automatically be-
tween the key-frames. Experimental evaluations on an exclusive video matting dataset
validates the effectiveness of the method.
Third, a new sampling-based video matting algorithm is proposed that reinterprets
the matting problem from the perspective of sparse reconstruction error of F and B
samples. Sampling methods generally select an (F, B) pair that produces the least recon-
struction error. The significance of the error has been left unexamined. Two patch-based
frameworks are used to ensure temporal coherency in the video mattes - a multi-frame
non-local means framework using coherency sensitive hashing and a patch-based multi-
frame graph model using motion. Qualitative and quantitative evaluations indicate the
performance of the method in reducing temporal jitter and maintaining spatial accuracy
in the video mattes. |
author2 |
Deepu Rajan |
author_facet |
Deepu Rajan Johnson, Jubin |
format |
Theses and Dissertations |
author |
Johnson, Jubin |
author_sort |
Johnson, Jubin |
title |
Sampling-based image and video matting without compositing equation |
title_short |
Sampling-based image and video matting without compositing equation |
title_full |
Sampling-based image and video matting without compositing equation |
title_fullStr |
Sampling-based image and video matting without compositing equation |
title_full_unstemmed |
Sampling-based image and video matting without compositing equation |
title_sort |
sampling-based image and video matting without compositing equation |
publishDate |
2017 |
url |
http://hdl.handle.net/10356/70341 |
_version_ |
1759857031003504640 |