3D hand estimation under egocentic vision

This dissertation introduces a novel two-stage transformer-based model for 3D hand pose estimation, specifically designed for egocentric conditions. The proposed architecture integrates a FastViT-ma36 backbone in the first stage, which efficiently extracts features from monocular RGB images. In the...

Full description

Saved in:
Bibliographic Details
Main Author: Zhu, Yixiang
Other Authors: Yap Kim Hui
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182401
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-182401
record_format dspace
spelling sg-ntu-dr.10356-1824012025-01-31T15:47:50Z 3D hand estimation under egocentic vision Zhu, Yixiang Yap Kim Hui School of Electrical and Electronic Engineering EKHYap@ntu.edu.sg Computer and Information Science Egocentric vision 3D hand pose estimation Transformer This dissertation introduces a novel two-stage transformer-based model for 3D hand pose estimation, specifically designed for egocentric conditions. The proposed architecture integrates a FastViT-ma36 backbone in the first stage, which efficiently extracts features from monocular RGB images. In the second stage, three transformer encoder layers are employed to refine pose accuracy by capturing essential spatial relationships between hand joints. This two-stage design ensures effective feature extraction and contextual awareness, addressing challenges such as occlusion and partial visibility. Our model significantly improves the accuracy of 3D hand pose estimation, achieving an area under the curve (AUC) of 0.87 on the FPHA dataset, compared to 0.76 in previous state-of-the-art methods. This enhancement demonstrates the effectiveness of the proposed architecture, with optimized feature extraction and transformer-based processing leading to substantial gains in pose estimation accuracy. Additionally, the model is robust to occlusions, maintaining high accuracy even under challenging self-occlusion and object-occlusion scenarios. The model achieves a real-time processing speed of over 200 frames per second (fps) on the FPHA dataset, making it a promising solution for highprecision, real-time hand pose estimation in practical scenarios. Master's degree 2025-01-31T08:20:09Z 2025-01-31T08:20:09Z 2024 Thesis-Master by Coursework Zhu, Y. (2024). 3D hand estimation under egocentic vision. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182401 https://hdl.handle.net/10356/182401 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Egocentric vision
3D hand pose estimation
Transformer
spellingShingle Computer and Information Science
Egocentric vision
3D hand pose estimation
Transformer
Zhu, Yixiang
3D hand estimation under egocentic vision
description This dissertation introduces a novel two-stage transformer-based model for 3D hand pose estimation, specifically designed for egocentric conditions. The proposed architecture integrates a FastViT-ma36 backbone in the first stage, which efficiently extracts features from monocular RGB images. In the second stage, three transformer encoder layers are employed to refine pose accuracy by capturing essential spatial relationships between hand joints. This two-stage design ensures effective feature extraction and contextual awareness, addressing challenges such as occlusion and partial visibility. Our model significantly improves the accuracy of 3D hand pose estimation, achieving an area under the curve (AUC) of 0.87 on the FPHA dataset, compared to 0.76 in previous state-of-the-art methods. This enhancement demonstrates the effectiveness of the proposed architecture, with optimized feature extraction and transformer-based processing leading to substantial gains in pose estimation accuracy. Additionally, the model is robust to occlusions, maintaining high accuracy even under challenging self-occlusion and object-occlusion scenarios. The model achieves a real-time processing speed of over 200 frames per second (fps) on the FPHA dataset, making it a promising solution for highprecision, real-time hand pose estimation in practical scenarios.
author2 Yap Kim Hui
author_facet Yap Kim Hui
Zhu, Yixiang
format Thesis-Master by Coursework
author Zhu, Yixiang
author_sort Zhu, Yixiang
title 3D hand estimation under egocentic vision
title_short 3D hand estimation under egocentic vision
title_full 3D hand estimation under egocentic vision
title_fullStr 3D hand estimation under egocentic vision
title_full_unstemmed 3D hand estimation under egocentic vision
title_sort 3d hand estimation under egocentic vision
publisher Nanyang Technological University
publishDate 2025
url https://hdl.handle.net/10356/182401
_version_ 1823108739355377664