Machine learning-based image demosaicing

In this thesis, the research work is mainly focused on exploiting data-driven convolutional neural network (CNN) methodology to further improve the performance of conventional signal processing approach. This investigation is motivated by our goal on the study of how these two methodologies can syne...

Full description

Saved in:
Bibliographic Details
Main Author: Zhou, Huan
Other Authors: Kai-Kuang Ma
Format: Thesis-Master by Research
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/144887
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In this thesis, the research work is mainly focused on exploiting data-driven convolutional neural network (CNN) methodology to further improve the performance of conventional signal processing approach. This investigation is motivated by our goal on the study of how these two methodologies can synergistically work together to improve the performance of various image processing tasks. In this work, image demosaicing, which is an indispensable processing for consumer-grade digital cameras and our handphones, is chosen as our focused image application. In the consideration of cost saving, the above-mentioned gadgets are equipped with only one charge-coupled device (CCD) sensor, and therefore image demosaicing is needed in order to restore the full-color image; the recovered image is called the demosaicked image. With decades of research on image demosaicing, a plethora of signal processing-based algorithms have been developed. However, it has been observed that their performances become saturated. Owing to this, we would like to investigate how to harness the potential of CNN to boost the performance of a signal processing-based method as its post-processing module. For that, our first contribution is on the development of a new CNN structure, called the U*-Net, which is a modified version of the existing U-Net. Our second contribution is to address a fundamental question that has been completely overlooked in the machine learning area---that is, how to construct a reliable and robust training dataset by selecting ‘good’ samples? For that, a set of formal procedures is developed and exploited to train our proposed U*-Net for our image demosaicing task. The significance of the second contribution is discussed in the following. It is well-recognized that one of the main concerns of using the machine learning methodology lies in the establishment of its training dataset, exploited for training the designed network. In the past, such training dataset are constructed in an ad-hoc manner. As a result, such data-driven training can easily get into insufficient coverage of the addressed problem. Owing to this, there is a strong need to investigate this fundamental issue that is quite critical and has been ignored in the past. To tackle this issue, a fundamental principle that we consider is to utilize the correlation existing among the three color components (i.e., red, green, and blue), since this is the key attribute used in the IRI image demosaicking method. It is important to further note that the above-mentioned correlation is only measured on the image’s high-frequency regions, which are extracted by a set of high-pass filters along with four different directions. The reliability and robustness of our proposed procedures has been measured via the student t-test on the demosaicked images. For evaluating the performance of various image demosaicing algorithms, two well-known benchmark test datasets, Kodak and Imax, are also used in this thesis. The training dataset established for training our proposed U*-Net is based on our suggested data selection procedures. The obtained simulation results are documented in this thesis, in terms of subjective evaluation on the demosaicked image quality and objective measurements; that is, peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). It shows that the demosaicked color images are closer to the ground truth with less image artifacts, and there is an extra 1.31 dB improvements on Kodak test dataset and 1.11 dB gains on the Imax test dataset, in addition to what the most advanced signal processing-based method (iterative residual interpolation, IRI) has achieved. For those less sophisticated methods, such gains (in dB) are even much higher; for example, there is a gain of 1.34 dB on Kodak and a gain of 4.26 dB on Imax for gradient based threshold free (GBTF) demosaicing. Last but not least, it is worthy to highlight that our works on training dataset establishment (i.e., the second contribution) can be also beneficial to other deep learning-based image processing tasks with some modifications.