Efficacy of transformers and patch augmentation in boosting stability and performance of multi-illumination white balance task
Color Constancy, or the ability to identify colors correctly independent of the illumination conditions, is a desirable quality for many computer vision models. Indeed, it has been demonstrated before that image classification, object detection & image segmentation models perform better on exper...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175206 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Color Constancy, or the ability to identify colors correctly independent of the illumination conditions, is a desirable quality for many computer vision models. Indeed, it has been demonstrated before that image classification, object detection & image segmentation models perform better on expertly White Balanced images. Thus, many approaches have been proposed to automatically correct the White Balance of images. Recently, there has been a marked interest in using Learning based methods, especially Deep Neural Networks for carrying out the White Balance Correction.
In this paper, we suggest a new Patch Augmentation Strategy that improves the performance of the model on the CIEDE 2000 metric for all considered datasets. Additionally, the model trained using the Patch Augmentation Strategy achieves a better overall performance in the Multi Illumination task, outperforming the base- line on both MSE and CIEDE 2000 measures.
As a secondary focus, we explore the use of a transformer backbone for enhancing performance on the White Balance Task. We discover that the Transformer model generates smoother images with lesser number of patches compared to the CNN model. However, the CNN model generates output images with a higher color fidelity and achieves better performance on all single illumination tasks.
Throughout our research, we use an input resolution of 224x224x3 for all our trained models in the hopes that this would make our results more compatible with common downstream models. All of our models have been made publicly available at https://huggingface.co/DChops/White_Balance. |
---|