Discovering class-specific visual patterns for visual recognition
Similar to frequent patterns in data mining, visual pattern refers to a recurring composition of visual contents in images or videos, such as repetitive texture regions, common objects among images, or similar actions among videos. Such visual patterns capture the recurrence nature of visual data...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/72462 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Similar to frequent patterns in data mining, visual pattern refers to a recurring composition of
visual contents in images or videos, such as repetitive texture regions, common objects among
images, or similar actions among videos. Such visual patterns capture the recurrence nature
of visual data and can represent the essence of the visual data. Finding such visual patterns is
critical to image and video data analysis.
In spite of the recent successes of unsupervised mining of representative visual patterns in
unlabeled visual data, for visual recognition tasks, the unsupervised mined visual patterns are
often not discriminative enough to distinguish among different classes. One natural way to
overcome this limitation is to leverage supervised learning and discover class-specific visual
patterns, which is the focus of this thesis. Particularly, we target at discovering the following
visual patterns of different structures: (1) class-specific local spatial patterns, e.g., local texture
structure that can help differentiate different object images; (2) class-specific spatial layout
patterns, e.g., spatial grid patterns that can help differentiate different scene images; (3) class-specific
visual pattern of compositional structures, e.g., conjunction (AND) and disjunction
(OR) forms of individual visual features that can help differentiate different scene images and
action videos.
To discover the above-mentioned class-specific visual patterns, this thesis is composed by the
following technical works. In the first work, we propose to mine mid-level visual phrases from
low-level visual primitives, e.g., local image patches or regions, by leveraging local spatial
context of visual primitives, multi-feature fusion of visual primitives, and also the weaklysupervised
image label information. In the second work, we propose to discover class-specific
spatial layouts for each scene category by casting a l1-regularized max-margin optimization
problem. In the third work, we propose a novel branch-and-bound based co-occurrence pattern
mining algorithm that can directly mine both optimal conjunctions (AND) and disjunctions
(OR) of individual features at arbitrary orders simultaneously with minimum classification error for boosting algorithm.
Similar to the third work, in the fourth work we aim to discover highorder
AND/OR patterns of skeleton features from depth camera for action recognition. We also
propose to integrate the discovered AND/OR patterns in an attention LSTM model for temporal
modeling to improve action recognition performance.
Compared with unsupervised visual pattern discovery, which usually separates the step of pattern
discovery and classification, our method can provide a joint learning of visual pattern
discovery and visual recognition. Also, different from conventional visual recognition which
emphasize purely on the classification performance, our class-specific visual patterns target
more on capturing the essence of difference visual classes, such that we not only can recognize
the visual classes, but also can explain and understand why they are different visual classes,
thanks to the discovered class-specific visual patterns. |
---|