3D hand estimation under egocentic vision
This dissertation introduces a novel two-stage transformer-based model for 3D hand pose estimation, specifically designed for egocentric conditions. The proposed architecture integrates a FastViT-ma36 backbone in the first stage, which efficiently extracts features from monocular RGB images. In the...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2025
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/182401 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-182401 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1824012025-01-31T15:47:50Z 3D hand estimation under egocentic vision Zhu, Yixiang Yap Kim Hui School of Electrical and Electronic Engineering EKHYap@ntu.edu.sg Computer and Information Science Egocentric vision 3D hand pose estimation Transformer This dissertation introduces a novel two-stage transformer-based model for 3D hand pose estimation, specifically designed for egocentric conditions. The proposed architecture integrates a FastViT-ma36 backbone in the first stage, which efficiently extracts features from monocular RGB images. In the second stage, three transformer encoder layers are employed to refine pose accuracy by capturing essential spatial relationships between hand joints. This two-stage design ensures effective feature extraction and contextual awareness, addressing challenges such as occlusion and partial visibility. Our model significantly improves the accuracy of 3D hand pose estimation, achieving an area under the curve (AUC) of 0.87 on the FPHA dataset, compared to 0.76 in previous state-of-the-art methods. This enhancement demonstrates the effectiveness of the proposed architecture, with optimized feature extraction and transformer-based processing leading to substantial gains in pose estimation accuracy. Additionally, the model is robust to occlusions, maintaining high accuracy even under challenging self-occlusion and object-occlusion scenarios. The model achieves a real-time processing speed of over 200 frames per second (fps) on the FPHA dataset, making it a promising solution for highprecision, real-time hand pose estimation in practical scenarios. Master's degree 2025-01-31T08:20:09Z 2025-01-31T08:20:09Z 2024 Thesis-Master by Coursework Zhu, Y. (2024). 3D hand estimation under egocentic vision. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182401 https://hdl.handle.net/10356/182401 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Egocentric vision 3D hand pose estimation Transformer |
spellingShingle |
Computer and Information Science Egocentric vision 3D hand pose estimation Transformer Zhu, Yixiang 3D hand estimation under egocentic vision |
description |
This dissertation introduces a novel two-stage transformer-based model for 3D hand pose estimation, specifically designed for egocentric conditions. The proposed architecture integrates a FastViT-ma36 backbone in the first stage, which efficiently extracts features from monocular RGB images. In the second stage, three transformer encoder layers are employed to refine pose accuracy by capturing essential spatial relationships between hand joints. This two-stage design ensures effective feature extraction and contextual awareness, addressing challenges such as occlusion and partial visibility. Our model significantly improves the accuracy of 3D hand pose estimation, achieving an area under the curve (AUC) of 0.87 on the FPHA dataset, compared to 0.76 in previous state-of-the-art methods. This enhancement demonstrates the effectiveness of the proposed architecture, with optimized feature extraction and transformer-based processing leading to substantial gains in pose estimation accuracy. Additionally, the model is robust to occlusions, maintaining high accuracy even under challenging self-occlusion and object-occlusion scenarios. The model achieves a real-time processing speed of over 200 frames per second (fps) on the FPHA dataset, making it a promising solution for highprecision, real-time hand pose estimation in practical scenarios. |
author2 |
Yap Kim Hui |
author_facet |
Yap Kim Hui Zhu, Yixiang |
format |
Thesis-Master by Coursework |
author |
Zhu, Yixiang |
author_sort |
Zhu, Yixiang |
title |
3D hand estimation under egocentic vision |
title_short |
3D hand estimation under egocentic vision |
title_full |
3D hand estimation under egocentic vision |
title_fullStr |
3D hand estimation under egocentic vision |
title_full_unstemmed |
3D hand estimation under egocentic vision |
title_sort |
3d hand estimation under egocentic vision |
publisher |
Nanyang Technological University |
publishDate |
2025 |
url |
https://hdl.handle.net/10356/182401 |
_version_ |
1823108739355377664 |