Comparative study of neural network architectures for lightweight image recognition tasks in augmented reality
This study provides a comparative analysis of neural network architectures for image recognition tasks, particularly in the context of lightweight augmented reality applications, such as on mobile devices. The study focuses on four popular neural network architectures: Convolutional Neural Networ...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175218 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This study provides a comparative analysis of neural network architectures for image recognition
tasks, particularly in the context of lightweight augmented reality applications, such as on mobile
devices. The study focuses on four popular neural network architectures: Convolutional Neural
Networks, MobileNet, EfficientNet, and ResNet-50. The study uses the HG14 dataset, a dataset
tailored for hand interaction and application control in augmented reality applications. In the
dataset, images taken from the first person view were used, especially for use with augmented
reality applications and wearable technologies.
The methodology used for this study involves training various models on the HG14 dataset and
evaluating model performance using validation accuracies in addition to test evaluation metrics
such as accuracy, precision, recall and F1-score. To optimize model performance, different
techniques (including data augmentation, transfer learning and hyperparameter tuning) will be
used. All architectures are implemented using Python on Jupyter Notebook and with access to a
NVIDIA Tesla P100 GPU.
Out of the architectures examined, MobileNet had the best performance due to its optimization
for mobile and embedded applications. The efficiency of MobileNet can be attributed to its
depth-wise separable convolutions which allow for high accuracy and low memory requirements.
Transfer learning, with MobileNet pre-trained on the Image-Net dataset, was beneficial in
generalizing to the HG14 dataset, with high validation accuracy and high test accuracy,
precision, recall, and F1-score. MobileNet’s lightweight architecture led to faster model training,
allowing for more flexibility in experimenting with different parameters to enhance model
performance.
EfficientNet, which uses a compound scaling algorithm that balances model complexity with
efficiency, also proved to be a capable algorithm at IR tasks for AR applications. ResNet-50
showed comparable performance to both MobileNet and Efficient, although it had higher
computational demands, which may limit its suitability in real-time AR scenarios. Basic CNN
models showed lower performance due to hardware dependencies and the resource-intensive
nature of their architecture, which also make them less suitable for real-time AR scenarios.
The study also examined different techniques to improve model performance. Data augmentation
yielded mixed results. EfficientNet was able to handle changes to the input data, but the other
algorithms’ performances were negatively affected by the augmentation patterns. This
highlighted the need to consider the nature of the dataset when implementing techniques to
improve model performance. Hyperparameter tuning led to a notable increase in the performance
of the base CNN model, but had negligible impact on the other architectures. Overall, transfer
learning was found to be the technique which consistently had a positive impact on model
performance across all four algorithms.
The MobileNet transfer learning model was found to be the best choice for hand gesture
recognition in AR applications. The model offers a good balance of accuracy and adaptability,
which is important for real-time and lightweight applications. The results from this study provide
valuable insights into how to select and optimize neural network architectures to perform IR
tasks in AR contexts, laying the groundwork for future advancements in embedded AR systems.
The research framework outlined provides a structured approach to conducting detailed
comparative analyses of neural network based architectures, making it easier to make informed
decisions when developing and deploying AR applications. |
---|