Toward human-centered automated driving: a novel spatiotemporal vision transformer-enabled head tracker

Accurate dynamic driver head pose tracking is of great importance for driver–automotive collaboration, intelligent copilot, head-up display (HUD), and other human-centered automated driving applications. To further advance this technology, this article proposes a low-cost and markerless headtracking...

Full description

Saved in:
Bibliographic Details
Main Authors: Hu, Zhongxu, Zhang, Yiran, Xing, Yang, Zhao, Yifan, Cao, Dongpu, Lv, Chen
Other Authors: School of Mechanical and Aerospace Engineering
Format: Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/162995
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Accurate dynamic driver head pose tracking is of great importance for driver–automotive collaboration, intelligent copilot, head-up display (HUD), and other human-centered automated driving applications. To further advance this technology, this article proposes a low-cost and markerless headtracking system using a deep learning-based dynamic head pose estimation model. The proposed system requires only a red, green, blue (RGB) camera without other hardware or markers. To enhance the accuracy of the driver’s head pose estimation, a spatiotemporal vision transformer (ST-ViT) model, which takes an image pair as the input instead of a single frame, is proposed. Compared to a standard transformer, the ST-ViT contains a spatial–convolutional vision transformer and a temporal transformer, which can improve the model performance. To handle the error fluctuation of the head pose estimation model, this article proposes an adaptive Kalman filter (AKF). By analyzing the error distribution of the estimation model and the user experience of the head tracker, the proposed AKF includes an adaptive observation noise coefficient; this can adaptively moderate the smoothness of the curve. Comprehensive experiments show that the proposed system is feasible and effective, and it achieves a state-of-the-art performance.