Study on quality evaluation and model testing of human motion dataset under manufacturing scenario

In the field of human trajectory prediction, most existing research focuses on urban roadways or indoor public spaces, often overlooking task-specific behaviors and interactions in industrial environments. To address this issue, our study utilized two datasets collected by Nanyang Technologica...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Li
Other Authors: Su Rong
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181401
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In the field of human trajectory prediction, most existing research focuses on urban roadways or indoor public spaces, often overlooking task-specific behaviors and interactions in industrial environments. To address this issue, our study utilized two datasets collected by Nanyang Technological University (NTU): the Fixed Detective Perspective (FDP) dataset and the First Person Perspective (FPP) dataset for human movement analysis in manufacturing environment. We extracted three features—pose, trajectory, and ego motion—from these datasets, which were used as inputs for a modified Convolutional Neural Network (CNN) and a modified Transformer model for human trajectory prediction. The experiments revealed that CNN is more suitable for tasks with strict training time requirements, while the Transformer model excels in tasks that demand higher accuracy. Moreover, experiments using the Transformer model with the optimal hyperparameter configuration, showed that the FDP-trained model achieved a Mean Absolute Error (MAE) of 84.1 pixels, compared to 158.9 pixels for the FPP-trained model, indicating that the FDP dataset, due to reduced self-motion noise, serves as a more suitable input. Furthermore, in scenarios where the image of human operators is incomplete due to occlusion, the Transformer model trained on sub-dataset where humans are occluded had an MAE of 180.7 pixels, while the model trained on the sub- dataset of human movement without occlusion had an MAE of 90.4 pixels, highlighting the challenges posed by occlusion in industrial environments. In the ablation study, different combinations of features—key points + pose, key points + ego motion, and key points + pose + ego motion—were used as inputs to the Transformer model. The results showed that the model trained with key points + pose achieved a Mean Absolute Error (MAE) of 11.82 pixels, the model trained with key points + ego motion had an MAE of 37.04 pixels, and the model trained with key points + pose + ego motion produced the lowest MAE of 10.79 pixels. All of these combinations significantly outperformed the model trained solely on trajectory, which had an MAE of 83.98 pixels. These results confirm that the inclusion of the pose feature plays a crucial role in improving the accuracy of the Transformer-based human trajectory prediction model, making it a key feature for enhancing predictive performance in industrial environments.