Parameter-Efficient Convolutional Neural Networks using Wavelet Transforms

Convolutional Neural Networks (CNN's) are known to perform well on computer vision tasks such as image classification, image segmentation, and object detection. However, one major drawback of CNN's is the huge amount of computing and memory resources needed to train them. In this paper, we...

全面介紹

Saved in:
書目詳細資料
Main Authors: Malubay, Arnel L., Santos, Kurt Anthony C.De Los, Nable, Job A
格式: text
出版: Archīum Ateneo 2024
主題:
在線閱讀:https://archium.ateneo.edu/mathematics-faculty-pubs/255
https://doi.org/10.1063/5.0192309
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:Convolutional Neural Networks (CNN's) are known to perform well on computer vision tasks such as image classification, image segmentation, and object detection. However, one major drawback of CNN's is the huge amount of computing and memory resources needed to train them. In this paper, we propose an architectural unit which we call Upsampling-Based Wavelet Residual Block (UBWRB), that utilizes the 2D discrete wavelet transform coupled with upsampling operators and a residual connection to extract features from image data while having relatively fewer trainable parameters as compared to traditional convolutional layers. The discrete wavelet transform is a family of transforms that find extensive applications in signal processing and time-frequency analysis. For this paper, we use the filter-bank implementation of the discrete wavelet transform, allowing it to act in a similar fashion to a convolutional layer with fixed kernel weights. We demonstrate the performance and parameter-efficiency of CNN's with UBWRB's in the task of image classification by training them on the MNIST, Fashion-MNIST, and CIFAR-10 datasets. Our best-performing models achieve a test accuracy of 99.34% on the MNIST dataset while having less than 120,000 trainable parameters, and 92.90% and 84.27% on the Fashion-MNIST and CIFAR-10 datasets respectively, with both having less than 180,000 trainable parameters.