Dense prediction and deep learning in complex visual scenes

Many computer vision applications, such as video surveillance, autonomous driving, and crowd analysis, suffer from the challenging conditions of complex scenes, including haze, underwater, extreme lighting, and crowded and small objects. These scenes might degrade or compromise the performance of or...

Full description

Saved in:

Bibliographic Details
Main Author:	Wang, Yi
Other Authors:	Lap-Pui Chau
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Engineering::Computer science and engineering Engineering::Electrical and electronic engineering
Online Access:	https://hdl.handle.net/10356/152009
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-152009
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Engineering::Electrical and electronic engineering
spellingShingle	Engineering::Computer science and engineering Engineering::Electrical and electronic engineering Wang, Yi Dense prediction and deep learning in complex visual scenes
description	Many computer vision applications, such as video surveillance, autonomous driving, and crowd analysis, suffer from the challenging conditions of complex scenes, including haze, underwater, extreme lighting, and crowded and small objects. These scenes might degrade or compromise the performance of or even fail computer vision algorithms. It is valuable and important to develop methods to address such complex visual scenes. In this thesis, we follow a unified thinking for a series of dense prediction problems from low-level vision to high-level vision, i.e., restoration, detection, and recognition. In the restoration problem, haze and underwater scenes degrade the contrast and color of images due to light scattering and absorption. Research on de-scattering or dehazing refers to restore images captured in such scenes. As the first research direction of this thesis, we propose a novel image restoration approach for underwater imagery based on an adaptive attenuation-curve prior (AACP). The prior describes the fact that all pixel values of a clear image can be partitioned into several hundred distinct clusters in RGB space, and the pixel values in each cluster will be distributed on a curve with a power-function form after attenuated by water. Therefore, the pixel-wise medium transmission can be predicted according to the pixel value's distribution on such a curve. This method is generalizable and can be extended to hazy images. Moreover, according to the fact that ambient light exists in the infinite distant region of an outdoor image, we propose a new deep learning-based framework to estimate the ambient light by distant region segmentation (DRS). Qualitative and quantitative results show that the proposed methods achieve superior performance in comparison with state-of-the-art methods. In the detection problem, crowded objects present large-scale variation and severe occlusion, posing great challenges to object detectors. In addition, current crowd datasets only provide coarse point-level annotations, i.e., human heads are labeled as points, so state-of-the-art object detectors cannot be trivially applied to such point supervision. In our second research direction, we propose a novel self-training approach that enables a typical object detector trained only with point-level annotations to densely predict center points and sizes of crowded objects, termed Crowd-DCNet. Specifically, we propose the locally-uniform distribution assumption (LUDA) for initializing pseudo object sizes from point-level supervisory information, the crowdedness-aware loss for regressing object sizes, and the confidence and order-aware refinement scheme for refining the pseudo object sizes continuously during training. With our self-training approach, the ability of the detector is increasingly boosted. Moreover, bypassing object detection, we introduce a compact convolutional neural network (CNN) for object counting in video surveillance, in which a multi-scale density (MSD) regressor is employed to predict the coarse- and fine-scale density maps. The comprehensive experimental results on six challenging benchmark datasets show that our approach significantly outperforms state-of-the-art methods under both detection and counting tasks. In the recognition problem, small objects in unconstrained scenes adversely affect the accuracy of automatic recognition systems. Our third research direction focuses on automatic license plate recognition (ALPR) in unconstrained environments, such as oblique views, uneven illumination, and various weather conditions. Our study produces an outstanding design of ALPR with four insights: (1) the resampling-based cascaded framework is beneficial to both speed and accuracy; (2) the highly efficient license plate recognition should abandon additional character segmentation and recurrent neural network (RNN), but adopt a plain CNN; (3) in the case of CNN, taking advantage of vertex information on license plates improves recognition performance; and (4) the weight-sharing character classifier addresses the lack of training images in small-scale datasets. Based on these insights, we propose a real-time and high-performing ALPR approach, termed VSNet. The vertex supervisory information is fully exploited for training a detector (VertexNet) to predict the geometric shapes of license plates such that license plates can be rectified and their characters can be densely predicted by a recognizer (SCR-Net). Moreover, we propose a dynamic regularization method to avoid overfitting and improve the generalization ability of CNN. Experimental results on two challenging benchmark datasets demonstrate the effectiveness of the proposed method.
author2	Lap-Pui Chau
author_facet	Lap-Pui Chau Wang, Yi
format	Thesis-Doctor of Philosophy
author	Wang, Yi
author_sort	Wang, Yi
title	Dense prediction and deep learning in complex visual scenes
title_short	Dense prediction and deep learning in complex visual scenes
title_full	Dense prediction and deep learning in complex visual scenes
title_fullStr	Dense prediction and deep learning in complex visual scenes
title_full_unstemmed	Dense prediction and deep learning in complex visual scenes
title_sort	dense prediction and deep learning in complex visual scenes
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/152009
_version_	1772826553906888704
spelling	sg-ntu-dr.10356-1520092023-07-04T17:02:00Z Dense prediction and deep learning in complex visual scenes Wang, Yi Lap-Pui Chau School of Electrical and Electronic Engineering Centre for Information Sciences and Systems elpchau@ntu.edu.sg Engineering::Computer science and engineering Engineering::Electrical and electronic engineering Many computer vision applications, such as video surveillance, autonomous driving, and crowd analysis, suffer from the challenging conditions of complex scenes, including haze, underwater, extreme lighting, and crowded and small objects. These scenes might degrade or compromise the performance of or even fail computer vision algorithms. It is valuable and important to develop methods to address such complex visual scenes. In this thesis, we follow a unified thinking for a series of dense prediction problems from low-level vision to high-level vision, i.e., restoration, detection, and recognition. In the restoration problem, haze and underwater scenes degrade the contrast and color of images due to light scattering and absorption. Research on de-scattering or dehazing refers to restore images captured in such scenes. As the first research direction of this thesis, we propose a novel image restoration approach for underwater imagery based on an adaptive attenuation-curve prior (AACP). The prior describes the fact that all pixel values of a clear image can be partitioned into several hundred distinct clusters in RGB space, and the pixel values in each cluster will be distributed on a curve with a power-function form after attenuated by water. Therefore, the pixel-wise medium transmission can be predicted according to the pixel value's distribution on such a curve. This method is generalizable and can be extended to hazy images. Moreover, according to the fact that ambient light exists in the infinite distant region of an outdoor image, we propose a new deep learning-based framework to estimate the ambient light by distant region segmentation (DRS). Qualitative and quantitative results show that the proposed methods achieve superior performance in comparison with state-of-the-art methods. In the detection problem, crowded objects present large-scale variation and severe occlusion, posing great challenges to object detectors. In addition, current crowd datasets only provide coarse point-level annotations, i.e., human heads are labeled as points, so state-of-the-art object detectors cannot be trivially applied to such point supervision. In our second research direction, we propose a novel self-training approach that enables a typical object detector trained only with point-level annotations to densely predict center points and sizes of crowded objects, termed Crowd-DCNet. Specifically, we propose the locally-uniform distribution assumption (LUDA) for initializing pseudo object sizes from point-level supervisory information, the crowdedness-aware loss for regressing object sizes, and the confidence and order-aware refinement scheme for refining the pseudo object sizes continuously during training. With our self-training approach, the ability of the detector is increasingly boosted. Moreover, bypassing object detection, we introduce a compact convolutional neural network (CNN) for object counting in video surveillance, in which a multi-scale density (MSD) regressor is employed to predict the coarse- and fine-scale density maps. The comprehensive experimental results on six challenging benchmark datasets show that our approach significantly outperforms state-of-the-art methods under both detection and counting tasks. In the recognition problem, small objects in unconstrained scenes adversely affect the accuracy of automatic recognition systems. Our third research direction focuses on automatic license plate recognition (ALPR) in unconstrained environments, such as oblique views, uneven illumination, and various weather conditions. Our study produces an outstanding design of ALPR with four insights: (1) the resampling-based cascaded framework is beneficial to both speed and accuracy; (2) the highly efficient license plate recognition should abandon additional character segmentation and recurrent neural network (RNN), but adopt a plain CNN; (3) in the case of CNN, taking advantage of vertex information on license plates improves recognition performance; and (4) the weight-sharing character classifier addresses the lack of training images in small-scale datasets. Based on these insights, we propose a real-time and high-performing ALPR approach, termed VSNet. The vertex supervisory information is fully exploited for training a detector (VertexNet) to predict the geometric shapes of license plates such that license plates can be rectified and their characters can be densely predicted by a recognizer (SCR-Net). Moreover, we propose a dynamic regularization method to avoid overfitting and improve the generalization ability of CNN. Experimental results on two challenging benchmark datasets demonstrate the effectiveness of the proposed method. Doctor of Philosophy 2021-07-13T03:21:22Z 2021-07-13T03:21:22Z 2021 Thesis-Doctor of Philosophy Wang, Y. (2021). Dense prediction and deep learning in complex visual scenes. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/152009 https://hdl.handle.net/10356/152009 10.32657/10356/152009 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Dense prediction and deep learning in complex visual scenes

Similar Items