Hardware-constrained edge deep learning

Neural Networks have become commonplace in our daily lives, powering everything from language models in chatbots to computer vision models in industrial machinery. The unending quest for greater model performance has led to an exponential growth in model size. For many devices, especially edge dev...

Full description

Saved in:
Bibliographic Details
Main Author: Ng, Jia Rui
Other Authors: Weichen Liu
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181190
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-181190
record_format dspace
spelling sg-ntu-dr.10356-1811902024-11-18T02:29:13Z Hardware-constrained edge deep learning Ng, Jia Rui Weichen Liu College of Computing and Data Science liu@ntu.edu.sg Computer and Information Science Neural Networks have become commonplace in our daily lives, powering everything from language models in chatbots to computer vision models in industrial machinery. The unending quest for greater model performance has led to an exponential growth in model size. For many devices, especially edge devices, storing or even running these models in a performant manner proves to be a challenge. In this paper, various memory compression methods, centered around post-training quantization, are explored for Large Language Models (LLMs) by comparing accuracy (perplexity) and inference latency (token generation speed). The report concludes that most LLMs can be quantized significantly without an observable loss in accuracy. However, very aggressive quantization (\<=3 bits) can lead to rambling responses and a significant degradation in user experience. Further work can also be done to explore kernel-level quantization for convolutional neural networks and pseudo-vectorization for embedded use cases. Bachelor's degree 2024-11-18T02:29:13Z 2024-11-18T02:29:13Z 2024 Final Year Project (FYP) Ng, J. R. (2024). Hardware-constrained edge deep learning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181190 https://hdl.handle.net/10356/181190 en CCDS24-0111 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
spellingShingle Computer and Information Science
Ng, Jia Rui
Hardware-constrained edge deep learning
description Neural Networks have become commonplace in our daily lives, powering everything from language models in chatbots to computer vision models in industrial machinery. The unending quest for greater model performance has led to an exponential growth in model size. For many devices, especially edge devices, storing or even running these models in a performant manner proves to be a challenge. In this paper, various memory compression methods, centered around post-training quantization, are explored for Large Language Models (LLMs) by comparing accuracy (perplexity) and inference latency (token generation speed). The report concludes that most LLMs can be quantized significantly without an observable loss in accuracy. However, very aggressive quantization (\<=3 bits) can lead to rambling responses and a significant degradation in user experience. Further work can also be done to explore kernel-level quantization for convolutional neural networks and pseudo-vectorization for embedded use cases.
author2 Weichen Liu
author_facet Weichen Liu
Ng, Jia Rui
format Final Year Project
author Ng, Jia Rui
author_sort Ng, Jia Rui
title Hardware-constrained edge deep learning
title_short Hardware-constrained edge deep learning
title_full Hardware-constrained edge deep learning
title_fullStr Hardware-constrained edge deep learning
title_full_unstemmed Hardware-constrained edge deep learning
title_sort hardware-constrained edge deep learning
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/181190
_version_ 1816858967228284928