Multi-view CNNS for hand gesture recognition

Hand gesture recognition plays a significant role in human-computer interaction and has broad applications in augmented/virtual reality. It is a practical project that it can be used to help those disabled people with special needs and requirements. Despite the recent progress on recognizing the...

Full description

Saved in:
Bibliographic Details
Main Author: Dou, Yudi
Other Authors: Yuan Junsong
Format: Theses and Dissertations
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/72613
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Hand gesture recognition plays a significant role in human-computer interaction and has broad applications in augmented/virtual reality. It is a practical project that it can be used to help those disabled people with special needs and requirements. Despite the recent progress on recognizing the hand pose of Arabic numbers from 0 to 9, the accuracy of sign language’s hand gesture recognition is still far from satisfactory since the finger articulations are more complex. Di↵erent from the traditional model-driven method, we do not need colored markers to extract features or skin color models to represent hand region. We focus on data-driven approaches which do not need complex calibration. In this dissertation, we create our own dataset of 30 Chinese sign languages and propose to use multiview CNN-based methods to recognize them. We project hand depth image onto three orthogonal planes and feed every plane’s projected image into a convolutional neural network to generate three probabilities. It is a task of classification and then fuse three views’ output together to recognize the final hand gesture. In order to put this project into application, we also work on producing the real-time demo to output a string like Chinese spelling. Experiments show that the proposed approach could recognize 30 Chinese hand gestures accurately and produce demo in real-time.