Comparative study of neural network architectures for lightweight image recognition tasks in augmented reality

This study provides a comparative analysis of neural network architectures for image recognition tasks, particularly in the context of lightweight augmented reality applications, such as on mobile devices. The study focuses on four popular neural network architectures: Convolutional Neural Networ...

Full description

Saved in:
Bibliographic Details
Main Author: Anindita, Roy
Other Authors: Seah Hock Soon
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175218
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This study provides a comparative analysis of neural network architectures for image recognition tasks, particularly in the context of lightweight augmented reality applications, such as on mobile devices. The study focuses on four popular neural network architectures: Convolutional Neural Networks, MobileNet, EfficientNet, and ResNet-50. The study uses the HG14 dataset, a dataset tailored for hand interaction and application control in augmented reality applications. In the dataset, images taken from the first person view were used, especially for use with augmented reality applications and wearable technologies. The methodology used for this study involves training various models on the HG14 dataset and evaluating model performance using validation accuracies in addition to test evaluation metrics such as accuracy, precision, recall and F1-score. To optimize model performance, different techniques (including data augmentation, transfer learning and hyperparameter tuning) will be used. All architectures are implemented using Python on Jupyter Notebook and with access to a NVIDIA Tesla P100 GPU. Out of the architectures examined, MobileNet had the best performance due to its optimization for mobile and embedded applications. The efficiency of MobileNet can be attributed to its depth-wise separable convolutions which allow for high accuracy and low memory requirements. Transfer learning, with MobileNet pre-trained on the Image-Net dataset, was beneficial in generalizing to the HG14 dataset, with high validation accuracy and high test accuracy, precision, recall, and F1-score. MobileNet’s lightweight architecture led to faster model training, allowing for more flexibility in experimenting with different parameters to enhance model performance. EfficientNet, which uses a compound scaling algorithm that balances model complexity with efficiency, also proved to be a capable algorithm at IR tasks for AR applications. ResNet-50 showed comparable performance to both MobileNet and Efficient, although it had higher computational demands, which may limit its suitability in real-time AR scenarios. Basic CNN models showed lower performance due to hardware dependencies and the resource-intensive nature of their architecture, which also make them less suitable for real-time AR scenarios. The study also examined different techniques to improve model performance. Data augmentation yielded mixed results. EfficientNet was able to handle changes to the input data, but the other algorithms’ performances were negatively affected by the augmentation patterns. This highlighted the need to consider the nature of the dataset when implementing techniques to improve model performance. Hyperparameter tuning led to a notable increase in the performance of the base CNN model, but had negligible impact on the other architectures. Overall, transfer learning was found to be the technique which consistently had a positive impact on model performance across all four algorithms. The MobileNet transfer learning model was found to be the best choice for hand gesture recognition in AR applications. The model offers a good balance of accuracy and adaptability, which is important for real-time and lightweight applications. The results from this study provide valuable insights into how to select and optimize neural network architectures to perform IR tasks in AR contexts, laying the groundwork for future advancements in embedded AR systems. The research framework outlined provides a structured approach to conducting detailed comparative analyses of neural network based architectures, making it easier to make informed decisions when developing and deploying AR applications.