Training binary neural networks
Convolutional Neural Networks (CNNs) or convnets, for short – have in recent years achieved results which were previously considered to be purely within the human realm. It refers to computational models that perform special convolution and pooling operations for the detection of feature maps fro...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/156664 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Convolutional Neural Networks (CNNs) or convnets, for short – have in recent years achieved
results which were previously considered to be purely within the human realm. It refers to
computational models that perform special convolution and pooling operations for the
detection of feature maps from sample images. CNNs are heralded for its high performance
and adaptability in many state-of-the-art computing niches today like Edge and Cloud
Computing. As such, it is a prominent tool for machine learning in the industry today. However,
deep CNNs are still very much a developing field today. In terms of computational complexity,
CNNs are extremely demanding in both hardware and software complexity. Particularly, the
excessive power consumption, memory usage and size make CNNs typically unfeasible to be
used on small or embedded devices. Hence, Binary Neural Networks (BNN) are proposed to
extract maximal performance while reducing computation footprint.
An empirical understanding of the established BNN models and tools will greatly aid any
development for the machine learning field in the future. To that end, in this paper I aim to
study the extensive history and motivations behind BNNs and discuss the optimisations and
hyper-parameters of various BNN model implementations. Training BNNs are important
because we can benchmark the different implementations and improvements in the models and
compare them, as well as derive conclusions from how different models are impacted by
different settings of parameters. A short literature review of each established BNN model in
terms of their functionality and contributions will be presented, followed by a baseline
implementation and hyper-parameter training setting in PyTorch. I will study the methodology
as well as the innovations of some of the prominent BNN improvements in this industry.
Through these experiments, I am able to examine the best parameter settings for training the
best BNNs. Some of these hyper-parameters include the learning rate scheduler, batch size,
network architecture and even the initial learning rate. After that, I will compare and contrasts
the results of my testing and give recommendations for individuals pursuing research into
quantised models in the future. These recommendations will be in the form of the best
configurations and recommended settings for training BNN models for different objectives. |
---|