Hardware-constrained edge deep learning

Neural Networks have become commonplace in our daily lives, powering everything from language models in chatbots to computer vision models in industrial machinery. The unending quest for greater model performance has led to an exponential growth in model size. For many devices, especially edge dev...

Full description

Saved in:

Bibliographic Details
Main Author:	Ng, Jia Rui
Other Authors:	Weichen Liu
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science
Online Access:	https://hdl.handle.net/10356/181190
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-181190
record_format	dspace
spelling	sg-ntu-dr.10356-1811902024-11-18T02:29:13Z Hardware-constrained edge deep learning Ng, Jia Rui Weichen Liu College of Computing and Data Science liu@ntu.edu.sg Computer and Information Science Neural Networks have become commonplace in our daily lives, powering everything from language models in chatbots to computer vision models in industrial machinery. The unending quest for greater model performance has led to an exponential growth in model size. For many devices, especially edge devices, storing or even running these models in a performant manner proves to be a challenge. In this paper, various memory compression methods, centered around post-training quantization, are explored for Large Language Models (LLMs) by comparing accuracy (perplexity) and inference latency (token generation speed). The report concludes that most LLMs can be quantized significantly without an observable loss in accuracy. However, very aggressive quantization (\<=3 bits) can lead to rambling responses and a significant degradation in user experience. Further work can also be done to explore kernel-level quantization for convolutional neural networks and pseudo-vectorization for embedded use cases. Bachelor's degree 2024-11-18T02:29:13Z 2024-11-18T02:29:13Z 2024 Final Year Project (FYP) Ng, J. R. (2024). Hardware-constrained edge deep learning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181190 https://hdl.handle.net/10356/181190 en CCDS24-0111 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science
spellingShingle	Computer and Information Science Ng, Jia Rui Hardware-constrained edge deep learning
description	Neural Networks have become commonplace in our daily lives, powering everything from language models in chatbots to computer vision models in industrial machinery. The unending quest for greater model performance has led to an exponential growth in model size. For many devices, especially edge devices, storing or even running these models in a performant manner proves to be a challenge. In this paper, various memory compression methods, centered around post-training quantization, are explored for Large Language Models (LLMs) by comparing accuracy (perplexity) and inference latency (token generation speed). The report concludes that most LLMs can be quantized significantly without an observable loss in accuracy. However, very aggressive quantization (\<=3 bits) can lead to rambling responses and a significant degradation in user experience. Further work can also be done to explore kernel-level quantization for convolutional neural networks and pseudo-vectorization for embedded use cases.
author2	Weichen Liu
author_facet	Weichen Liu Ng, Jia Rui
format	Final Year Project
author	Ng, Jia Rui
author_sort	Ng, Jia Rui
title	Hardware-constrained edge deep learning
title_short	Hardware-constrained edge deep learning
title_full	Hardware-constrained edge deep learning
title_fullStr	Hardware-constrained edge deep learning
title_full_unstemmed	Hardware-constrained edge deep learning
title_sort	hardware-constrained edge deep learning
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/181190
_version_	1816858967228284928

Hardware-constrained edge deep learning

Similar Items