Parameter-Efficient Convolutional Neural Networks using Wavelet Transforms

Convolutional Neural Networks (CNN's) are known to perform well on computer vision tasks such as image classification, image segmentation, and object detection. However, one major drawback of CNN's is the huge amount of computing and memory resources needed to train them. In this paper, we...

Full description

Saved in:
Bibliographic Details
Main Authors: Malubay, Arnel L., Santos, Kurt Anthony C.De Los, Nable, Job A
Format: text
Published: Archīum Ateneo 2024
Subjects:
Online Access:https://archium.ateneo.edu/mathematics-faculty-pubs/255
https://doi.org/10.1063/5.0192309
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University
Description
Summary:Convolutional Neural Networks (CNN's) are known to perform well on computer vision tasks such as image classification, image segmentation, and object detection. However, one major drawback of CNN's is the huge amount of computing and memory resources needed to train them. In this paper, we propose an architectural unit which we call Upsampling-Based Wavelet Residual Block (UBWRB), that utilizes the 2D discrete wavelet transform coupled with upsampling operators and a residual connection to extract features from image data while having relatively fewer trainable parameters as compared to traditional convolutional layers. The discrete wavelet transform is a family of transforms that find extensive applications in signal processing and time-frequency analysis. For this paper, we use the filter-bank implementation of the discrete wavelet transform, allowing it to act in a similar fashion to a convolutional layer with fixed kernel weights. We demonstrate the performance and parameter-efficiency of CNN's with UBWRB's in the task of image classification by training them on the MNIST, Fashion-MNIST, and CIFAR-10 datasets. Our best-performing models achieve a test accuracy of 99.34% on the MNIST dataset while having less than 120,000 trainable parameters, and 92.90% and 84.27% on the Fashion-MNIST and CIFAR-10 datasets respectively, with both having less than 180,000 trainable parameters.