Study on quality evaluation and model testing of human motion dataset under manufacturing scenario
In the field of human trajectory prediction, most existing research focuses on urban roadways or indoor public spaces, often overlooking task-specific behaviors and interactions in industrial environments. To address this issue, our study utilized two datasets collected by Nanyang Technologica...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181401 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In the field of human trajectory prediction, most existing research focuses on
urban roadways or indoor public spaces, often overlooking task-specific behaviors
and interactions in industrial environments. To address this issue, our study utilized
two datasets collected by Nanyang Technological University (NTU): the Fixed
Detective Perspective (FDP) dataset and the First Person Perspective (FPP) dataset
for human movement analysis in manufacturing environment. We extracted three
features—pose, trajectory, and ego motion—from these datasets, which were used as
inputs for a modified Convolutional Neural Network (CNN) and a modified
Transformer model for human trajectory prediction. The experiments revealed that
CNN is more suitable for tasks with strict training time requirements, while the
Transformer model excels in tasks that demand higher accuracy. Moreover,
experiments using the Transformer model with the optimal hyperparameter
configuration, showed that the FDP-trained model achieved a Mean Absolute Error
(MAE) of 84.1 pixels, compared to 158.9 pixels for the FPP-trained model, indicating
that the FDP dataset, due to reduced self-motion noise, serves as a more suitable input.
Furthermore, in scenarios where the image of human operators is incomplete due to
occlusion, the Transformer model trained on sub-dataset where humans are occluded
had an MAE of 180.7 pixels, while the model trained on the sub- dataset of human
movement without occlusion had an MAE of 90.4 pixels, highlighting the challenges posed by occlusion in industrial environments. In the ablation study, different
combinations of features—key points + pose, key points + ego motion, and key points
+ pose + ego motion—were used as inputs to the Transformer model. The results
showed that the model trained with key points + pose achieved a Mean Absolute Error
(MAE) of 11.82 pixels, the model trained with key points + ego motion had an MAE
of 37.04 pixels, and the model trained with key points + pose + ego motion produced
the lowest MAE of 10.79 pixels. All of these combinations significantly outperformed
the model trained solely on trajectory, which had an MAE of 83.98 pixels. These
results confirm that the inclusion of the pose feature plays a crucial role in improving
the accuracy of the Transformer-based human trajectory prediction model, making it
a key feature for enhancing predictive performance in industrial environments. |
---|