Model-based markerless human motion capture from multiple camera sequences

The tracking of 3D articulated body motion from video sequences, or markerless motion capture, plays an important role in a wide variety of potential applications: human computer interaction, biomechanics, computer animation, surveillance and sport analysis. Though there have been remarkable advance...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Zheng.
Other Authors: Seah Hock Soon
Format: Theses and Dissertations
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/52724
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-52724
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Zhang, Zheng.
Model-based markerless human motion capture from multiple camera sequences
description The tracking of 3D articulated body motion from video sequences, or markerless motion capture, plays an important role in a wide variety of potential applications: human computer interaction, biomechanics, computer animation, surveillance and sport analysis. Though there have been remarkable advances in vision-based motion capture, pose tracking from multiple images has not been extensively studied: no existing work produces a solution comparable to that of existing marker-based motion capture methods which generally can recover accurate 3D full body motions in real-time. In this thesis, we develop new methods for human body motion tracking with the main focus on tackling the scenarios where multiple cameras are assumed available. Our research follows a 3D data-based tracking framework, where 3D data, e.g., colored volume and scene flow (i.e., 3D optical flow), is firstly reconstructed and then the optimal human posture is recovered from the 3D data at every instant in time. A multiple camera system with eight cameras is firstly assembled to capture synchronized multiple image video streams. We present and implement efficient methods for synthesizing and rendering 3D reconstruction of the real world dynamic scenes. Model-based pose estimation approach is the mainstream research direction as it takes into account the underlying structure and exploits shape prior information which is beneficial to resolving occlusions and ambiguities. Our methods belong to this category. For a complete model-based pose tracking approach, body model initialization is one key problem. At initialization, the body model must be adapted to fit the shape and size of the subject to be tracked, and must be initialized with the pose at the beginning frame where no temporal and strong prior information is available. For this, we present a robust solution, where pose estimation is performed in a hierarchical way with space constraints enforced on each PSO (particle swarm optimization) based sub-optimization step. The combination of hierarchical estimation and stochastic particle-based search, which has strong global search ability, makes our approach capable of recovering the body pose even when the initial pose is very far from the correct solution. To improve estimation and tracking accuracy and robustness, we present a method to acquire a subject-specific body model which well fits the subject's body shape, and we exploit it for the task of pose tracking. With the voxel-based subject-specific body model, a new model-based pose search method is proposed. The tracking is performed in 3D space using 3D data including colored volume and 3D scene flow reconstructed at every frame. We introduce strategies to compute view-independent scene flow in combination with volumetric reconstruction, and have attained efficient scene flow computation. Our body pose estimation starts with a prediction using scene flow and then it is changed to a lower dimensional global optimization problem. Our method exploits multiple 3D cues and incorporates physical constraints into a stochastic particle-based search initialized from the deterministic prediction and stochastic sampling. Continuing with the voxel-based body model, we proposed to use a multi-layer search method. The first layer, niching swarm filter (NSF), is a stochastical sampling algorithm and the second layer performs pose refinement using local optimization. In order to generalize well to general human motions, our approach does not use strong or specific motion models. We introduce a stochastical niching search into a particle filter to move particles to significant peaks of likelihoods. The local optimization of the second layer not only reduces the time cost, but also increases the accuracy of the sampling estimation, which is required for NSF to attain higher precision. The requirement of real-time processing motivates us to accelerate the tracking by implementing time-consuming steps on GPU using CUDA. Benefiting from the massive parallelism of GPU, our method is capable of tracking full body movements robustly and efficiently.
author2 Seah Hock Soon
author_facet Seah Hock Soon
Zhang, Zheng.
format Theses and Dissertations
author Zhang, Zheng.
author_sort Zhang, Zheng.
title Model-based markerless human motion capture from multiple camera sequences
title_short Model-based markerless human motion capture from multiple camera sequences
title_full Model-based markerless human motion capture from multiple camera sequences
title_fullStr Model-based markerless human motion capture from multiple camera sequences
title_full_unstemmed Model-based markerless human motion capture from multiple camera sequences
title_sort model-based markerless human motion capture from multiple camera sequences
publishDate 2013
url http://hdl.handle.net/10356/52724
_version_ 1759854908809412608
spelling sg-ntu-dr.10356-527242023-03-04T00:35:34Z Model-based markerless human motion capture from multiple camera sequences Zhang, Zheng. Seah Hock Soon School of Computer Engineering Game Lab DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision The tracking of 3D articulated body motion from video sequences, or markerless motion capture, plays an important role in a wide variety of potential applications: human computer interaction, biomechanics, computer animation, surveillance and sport analysis. Though there have been remarkable advances in vision-based motion capture, pose tracking from multiple images has not been extensively studied: no existing work produces a solution comparable to that of existing marker-based motion capture methods which generally can recover accurate 3D full body motions in real-time. In this thesis, we develop new methods for human body motion tracking with the main focus on tackling the scenarios where multiple cameras are assumed available. Our research follows a 3D data-based tracking framework, where 3D data, e.g., colored volume and scene flow (i.e., 3D optical flow), is firstly reconstructed and then the optimal human posture is recovered from the 3D data at every instant in time. A multiple camera system with eight cameras is firstly assembled to capture synchronized multiple image video streams. We present and implement efficient methods for synthesizing and rendering 3D reconstruction of the real world dynamic scenes. Model-based pose estimation approach is the mainstream research direction as it takes into account the underlying structure and exploits shape prior information which is beneficial to resolving occlusions and ambiguities. Our methods belong to this category. For a complete model-based pose tracking approach, body model initialization is one key problem. At initialization, the body model must be adapted to fit the shape and size of the subject to be tracked, and must be initialized with the pose at the beginning frame where no temporal and strong prior information is available. For this, we present a robust solution, where pose estimation is performed in a hierarchical way with space constraints enforced on each PSO (particle swarm optimization) based sub-optimization step. The combination of hierarchical estimation and stochastic particle-based search, which has strong global search ability, makes our approach capable of recovering the body pose even when the initial pose is very far from the correct solution. To improve estimation and tracking accuracy and robustness, we present a method to acquire a subject-specific body model which well fits the subject's body shape, and we exploit it for the task of pose tracking. With the voxel-based subject-specific body model, a new model-based pose search method is proposed. The tracking is performed in 3D space using 3D data including colored volume and 3D scene flow reconstructed at every frame. We introduce strategies to compute view-independent scene flow in combination with volumetric reconstruction, and have attained efficient scene flow computation. Our body pose estimation starts with a prediction using scene flow and then it is changed to a lower dimensional global optimization problem. Our method exploits multiple 3D cues and incorporates physical constraints into a stochastic particle-based search initialized from the deterministic prediction and stochastic sampling. Continuing with the voxel-based body model, we proposed to use a multi-layer search method. The first layer, niching swarm filter (NSF), is a stochastical sampling algorithm and the second layer performs pose refinement using local optimization. In order to generalize well to general human motions, our approach does not use strong or specific motion models. We introduce a stochastical niching search into a particle filter to move particles to significant peaks of likelihoods. The local optimization of the second layer not only reduces the time cost, but also increases the accuracy of the sampling estimation, which is required for NSF to attain higher precision. The requirement of real-time processing motivates us to accelerate the tracking by implementing time-consuming steps on GPU using CUDA. Benefiting from the massive parallelism of GPU, our method is capable of tracking full body movements robustly and efficiently. Doctor of Philosophy (SCE) 2013-05-23T03:41:34Z 2013-05-23T03:41:34Z 2013 2013 Thesis http://hdl.handle.net/10356/52724 en 194 p. application/pdf