Deep-learning-based 3D driver pose estimation for autonomous driving

Human-machine interaction is a key for the future development of virtual reality, augmented reality, artificial intelligence and smart device. The application of human-machine interaction technology, especially human body estimation, in autonomous driving is important to facilitate drivers to drive...

Full description

Saved in:
Bibliographic Details
Main Author: Cao, Xiao
Other Authors: Lyu Chen
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/149968
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Human-machine interaction is a key for the future development of virtual reality, augmented reality, artificial intelligence and smart device. The application of human-machine interaction technology, especially human body estimation, in autonomous driving is important to facilitate drivers to drive safely and smoothly. Human estimation can detect driver fatigue. It can also help ergonomics research and then improve human-machine interface design in automated vehicles. Researchers have got great achievements in human state estimation, including body estimation, hand estimation and face estimation. In the past, human estimation technology is dependent on hardware devices while estimation methods based on machine learning and deep learning become increasingly more popular and show excellent performance compared with traditional ways in terms of cost and efficiency. However, most estimation models are developed separately, which means that the existing models can only process body estimation or hand estimation separately instead simultaneously, while the model that can identify different parts of human at the same time is more expected in the research and application. In this dissertation, five deep learning models, including Simple Faster R-CNN, RootNet, PoseNet, YOLOv3 and a hand estimation model, are selected and then combined through a cascade method to develop an integrated model which can estimate the human body and human hand simultaneously. The outputs of each model are saved in different coordinate systems, so they cannot be fed into the subsequent neural network directly. Hence, in this project, they are transformed into the same coordinate system by a rotation transformation matrix and that enables five models to be connected in series. Through the experiment designed specifically, the integrated model is proven to be able to produce 2D pose and 3D pose of the human body and human hands at the same time. In this project, many problems still exist. These problems will be solved and other functions, such as the face estimation models will be added in the future.