Multi-view CNNS for hand gesture recognition

Hand gesture recognition plays a significant role in human-computer interaction and has broad applications in augmented/virtual reality. It is a practical project that it can be used to help those disabled people with special needs and requirements. Despite the recent progress on recognizing the...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Dou, Yudi
مؤلفون آخرون: Yuan Junsong
التنسيق: Theses and Dissertations
اللغة:English
منشور في: 2017
الموضوعات:
الوصول للمادة أونلاين:http://hdl.handle.net/10356/72613
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: Nanyang Technological University
اللغة: English
الوصف
الملخص:Hand gesture recognition plays a significant role in human-computer interaction and has broad applications in augmented/virtual reality. It is a practical project that it can be used to help those disabled people with special needs and requirements. Despite the recent progress on recognizing the hand pose of Arabic numbers from 0 to 9, the accuracy of sign language’s hand gesture recognition is still far from satisfactory since the finger articulations are more complex. Di↵erent from the traditional model-driven method, we do not need colored markers to extract features or skin color models to represent hand region. We focus on data-driven approaches which do not need complex calibration. In this dissertation, we create our own dataset of 30 Chinese sign languages and propose to use multiview CNN-based methods to recognize them. We project hand depth image onto three orthogonal planes and feed every plane’s projected image into a convolutional neural network to generate three probabilities. It is a task of classification and then fuse three views’ output together to recognize the final hand gesture. In order to put this project into application, we also work on producing the real-time demo to output a string like Chinese spelling. Experiments show that the proposed approach could recognize 30 Chinese hand gestures accurately and produce demo in real-time.