Dense prediction and deep learning in complex visual scenes
Many computer vision applications, such as video surveillance, autonomous driving, and crowd analysis, suffer from the challenging conditions of complex scenes, including haze, underwater, extreme lighting, and crowded and small objects. These scenes might degrade or compromise the performance of or...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/152009 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-152009 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Engineering::Electrical and electronic engineering |
spellingShingle |
Engineering::Computer science and engineering Engineering::Electrical and electronic engineering Wang, Yi Dense prediction and deep learning in complex visual scenes |
description |
Many computer vision applications, such as video surveillance, autonomous driving, and crowd analysis, suffer from the challenging conditions of complex scenes, including haze, underwater, extreme lighting, and crowded and small objects. These scenes might degrade or compromise the performance of or even fail computer vision algorithms. It is valuable and important to develop methods to address such complex visual scenes. In this thesis, we follow a unified thinking for a series of dense prediction problems from low-level vision to high-level vision, i.e., restoration, detection, and recognition.
In the restoration problem, haze and underwater scenes degrade the contrast and color of images due to light scattering and absorption. Research on de-scattering or dehazing refers to restore images captured in such scenes. As the first research direction of this thesis, we propose a novel image restoration approach for underwater imagery based on an adaptive attenuation-curve prior (AACP). The prior describes the fact that all pixel values of a clear image can be partitioned into several hundred distinct clusters in RGB space, and the pixel values in each cluster will be distributed on a curve with a power-function form after attenuated by water. Therefore, the pixel-wise medium transmission can be predicted according to the pixel value's distribution on such a curve. This method is generalizable and can be extended to hazy images. Moreover, according to the fact that ambient light exists in the infinite distant region of an outdoor image, we propose a new deep learning-based framework to estimate the ambient light by distant region segmentation (DRS). Qualitative and quantitative results show that the proposed methods achieve superior performance in comparison with state-of-the-art methods.
In the detection problem, crowded objects present large-scale variation and severe occlusion, posing great challenges to object detectors. In addition, current crowd datasets only provide coarse point-level annotations, i.e., human heads are labeled as points, so state-of-the-art object detectors cannot be trivially applied to such point supervision. In our second research direction, we propose a novel self-training approach that enables a typical object detector trained only with point-level annotations to densely predict center points and sizes of crowded objects, termed Crowd-DCNet. Specifically, we propose the locally-uniform distribution assumption (LUDA) for initializing pseudo object sizes from point-level supervisory information, the crowdedness-aware loss for regressing object sizes, and the confidence and order-aware refinement scheme for refining the pseudo object sizes continuously during training. With our self-training approach, the ability of the detector is increasingly boosted. Moreover, bypassing object detection, we introduce a compact convolutional neural network (CNN) for object counting in video surveillance, in which a multi-scale density (MSD) regressor is employed to predict the coarse- and fine-scale density maps. The comprehensive experimental results on six challenging benchmark datasets show that our approach significantly outperforms state-of-the-art methods under both detection and counting tasks.
In the recognition problem, small objects in unconstrained scenes adversely affect the accuracy of automatic recognition systems. Our third research direction focuses on automatic license plate recognition (ALPR) in unconstrained environments, such as oblique views, uneven illumination, and various weather conditions. Our study produces an outstanding design of ALPR with four insights: (1) the resampling-based cascaded framework is beneficial to both speed and accuracy; (2) the highly efficient license plate recognition should abandon additional character segmentation and recurrent neural network (RNN), but adopt a plain CNN; (3) in the case of CNN, taking advantage of vertex information on license plates improves recognition performance; and (4) the weight-sharing character classifier addresses the lack of training images in small-scale datasets. Based on these insights, we propose a real-time and high-performing ALPR approach, termed VSNet. The vertex supervisory information is fully exploited for training a detector (VertexNet) to predict the geometric shapes of license plates such that license plates can be rectified and their characters can be densely predicted by a recognizer (SCR-Net). Moreover, we propose a dynamic regularization method to avoid overfitting and improve the generalization ability of CNN. Experimental results on two challenging benchmark datasets demonstrate the effectiveness of the proposed method. |
author2 |
Lap-Pui Chau |
author_facet |
Lap-Pui Chau Wang, Yi |
format |
Thesis-Doctor of Philosophy |
author |
Wang, Yi |
author_sort |
Wang, Yi |
title |
Dense prediction and deep learning in complex visual scenes |
title_short |
Dense prediction and deep learning in complex visual scenes |
title_full |
Dense prediction and deep learning in complex visual scenes |
title_fullStr |
Dense prediction and deep learning in complex visual scenes |
title_full_unstemmed |
Dense prediction and deep learning in complex visual scenes |
title_sort |
dense prediction and deep learning in complex visual scenes |
publisher |
Nanyang Technological University |
publishDate |
2021 |
url |
https://hdl.handle.net/10356/152009 |
_version_ |
1772826553906888704 |
spelling |
sg-ntu-dr.10356-1520092023-07-04T17:02:00Z Dense prediction and deep learning in complex visual scenes Wang, Yi Lap-Pui Chau School of Electrical and Electronic Engineering Centre for Information Sciences and Systems elpchau@ntu.edu.sg Engineering::Computer science and engineering Engineering::Electrical and electronic engineering Many computer vision applications, such as video surveillance, autonomous driving, and crowd analysis, suffer from the challenging conditions of complex scenes, including haze, underwater, extreme lighting, and crowded and small objects. These scenes might degrade or compromise the performance of or even fail computer vision algorithms. It is valuable and important to develop methods to address such complex visual scenes. In this thesis, we follow a unified thinking for a series of dense prediction problems from low-level vision to high-level vision, i.e., restoration, detection, and recognition. In the restoration problem, haze and underwater scenes degrade the contrast and color of images due to light scattering and absorption. Research on de-scattering or dehazing refers to restore images captured in such scenes. As the first research direction of this thesis, we propose a novel image restoration approach for underwater imagery based on an adaptive attenuation-curve prior (AACP). The prior describes the fact that all pixel values of a clear image can be partitioned into several hundred distinct clusters in RGB space, and the pixel values in each cluster will be distributed on a curve with a power-function form after attenuated by water. Therefore, the pixel-wise medium transmission can be predicted according to the pixel value's distribution on such a curve. This method is generalizable and can be extended to hazy images. Moreover, according to the fact that ambient light exists in the infinite distant region of an outdoor image, we propose a new deep learning-based framework to estimate the ambient light by distant region segmentation (DRS). Qualitative and quantitative results show that the proposed methods achieve superior performance in comparison with state-of-the-art methods. In the detection problem, crowded objects present large-scale variation and severe occlusion, posing great challenges to object detectors. In addition, current crowd datasets only provide coarse point-level annotations, i.e., human heads are labeled as points, so state-of-the-art object detectors cannot be trivially applied to such point supervision. In our second research direction, we propose a novel self-training approach that enables a typical object detector trained only with point-level annotations to densely predict center points and sizes of crowded objects, termed Crowd-DCNet. Specifically, we propose the locally-uniform distribution assumption (LUDA) for initializing pseudo object sizes from point-level supervisory information, the crowdedness-aware loss for regressing object sizes, and the confidence and order-aware refinement scheme for refining the pseudo object sizes continuously during training. With our self-training approach, the ability of the detector is increasingly boosted. Moreover, bypassing object detection, we introduce a compact convolutional neural network (CNN) for object counting in video surveillance, in which a multi-scale density (MSD) regressor is employed to predict the coarse- and fine-scale density maps. The comprehensive experimental results on six challenging benchmark datasets show that our approach significantly outperforms state-of-the-art methods under both detection and counting tasks. In the recognition problem, small objects in unconstrained scenes adversely affect the accuracy of automatic recognition systems. Our third research direction focuses on automatic license plate recognition (ALPR) in unconstrained environments, such as oblique views, uneven illumination, and various weather conditions. Our study produces an outstanding design of ALPR with four insights: (1) the resampling-based cascaded framework is beneficial to both speed and accuracy; (2) the highly efficient license plate recognition should abandon additional character segmentation and recurrent neural network (RNN), but adopt a plain CNN; (3) in the case of CNN, taking advantage of vertex information on license plates improves recognition performance; and (4) the weight-sharing character classifier addresses the lack of training images in small-scale datasets. Based on these insights, we propose a real-time and high-performing ALPR approach, termed VSNet. The vertex supervisory information is fully exploited for training a detector (VertexNet) to predict the geometric shapes of license plates such that license plates can be rectified and their characters can be densely predicted by a recognizer (SCR-Net). Moreover, we propose a dynamic regularization method to avoid overfitting and improve the generalization ability of CNN. Experimental results on two challenging benchmark datasets demonstrate the effectiveness of the proposed method. Doctor of Philosophy 2021-07-13T03:21:22Z 2021-07-13T03:21:22Z 2021 Thesis-Doctor of Philosophy Wang, Y. (2021). Dense prediction and deep learning in complex visual scenes. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/152009 https://hdl.handle.net/10356/152009 10.32657/10356/152009 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |