Multi-view fusion and machine learning in hand pose estimation from depth images

This project studies an approach to hand pose estimation that relies on convolutional neural network. In the recent years, hand pose estimation has been the subject of extensive research, with data-driven approaches appearing as the preferred method to perform the complex task of regressing a hand p...

Full description

Saved in:
Bibliographic Details
Main Author: Ong, Bee Lee
Other Authors: Lin Feng
Format: Final Year Project
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/74239
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This project studies an approach to hand pose estimation that relies on convolutional neural network. In the recent years, hand pose estimation has been the subject of extensive research, with data-driven approaches appearing as the preferred method to perform the complex task of regressing a hand pose to a depth image. In the end, this project will have developed an application consisting of two major parts: a frontend windowed application that can render a simple hand skeleton model given the locations of 21 landmark hand joints; and a backend which consists of the required processing logic and, most importantly, the convolutional neural network to drive the application’s intelligence. Inputs to the network comprise of a planar projection of a point cloud derived from the depth images obtained either from the Kinect device or from a hand gesture dataset. After forward propagation, the output of the network is a series of heatmaps encoding the likelihood of a particular hand joint being at a certain location. Despite having optimised much of the application, the neural network still requires some more fine-tuning of its hyperparameters due to the exploding gradient and dying ReLu problems. Future works can attempt to increase heatmap resolution for finer estimation results, as well as gradually cutting down the number of convolutional filters while preserving network accuracy to reduce redundant neurons and increase throughput.