Discovering class-specific visual patterns for visual recognition
Similar to frequent patterns in data mining, visual pattern refers to a recurring composition of visual contents in images or videos, such as repetitive texture regions, common objects among images, or similar actions among videos. Such visual patterns capture the recurrence nature of visual data...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/72462 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-72462 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-724622023-07-04T17:09:52Z Discovering class-specific visual patterns for visual recognition Weng, Chaoqun Yuan Junsong School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Similar to frequent patterns in data mining, visual pattern refers to a recurring composition of visual contents in images or videos, such as repetitive texture regions, common objects among images, or similar actions among videos. Such visual patterns capture the recurrence nature of visual data and can represent the essence of the visual data. Finding such visual patterns is critical to image and video data analysis. In spite of the recent successes of unsupervised mining of representative visual patterns in unlabeled visual data, for visual recognition tasks, the unsupervised mined visual patterns are often not discriminative enough to distinguish among different classes. One natural way to overcome this limitation is to leverage supervised learning and discover class-specific visual patterns, which is the focus of this thesis. Particularly, we target at discovering the following visual patterns of different structures: (1) class-specific local spatial patterns, e.g., local texture structure that can help differentiate different object images; (2) class-specific spatial layout patterns, e.g., spatial grid patterns that can help differentiate different scene images; (3) class-specific visual pattern of compositional structures, e.g., conjunction (AND) and disjunction (OR) forms of individual visual features that can help differentiate different scene images and action videos. To discover the above-mentioned class-specific visual patterns, this thesis is composed by the following technical works. In the first work, we propose to mine mid-level visual phrases from low-level visual primitives, e.g., local image patches or regions, by leveraging local spatial context of visual primitives, multi-feature fusion of visual primitives, and also the weaklysupervised image label information. In the second work, we propose to discover class-specific spatial layouts for each scene category by casting a l1-regularized max-margin optimization problem. In the third work, we propose a novel branch-and-bound based co-occurrence pattern mining algorithm that can directly mine both optimal conjunctions (AND) and disjunctions (OR) of individual features at arbitrary orders simultaneously with minimum classification error for boosting algorithm. Similar to the third work, in the fourth work we aim to discover highorder AND/OR patterns of skeleton features from depth camera for action recognition. We also propose to integrate the discovered AND/OR patterns in an attention LSTM model for temporal modeling to improve action recognition performance. Compared with unsupervised visual pattern discovery, which usually separates the step of pattern discovery and classification, our method can provide a joint learning of visual pattern discovery and visual recognition. Also, different from conventional visual recognition which emphasize purely on the classification performance, our class-specific visual patterns target more on capturing the essence of difference visual classes, such that we not only can recognize the visual classes, but also can explain and understand why they are different visual classes, thanks to the discovered class-specific visual patterns. Doctor of Philosophy (EEE) 2017-07-26T01:57:41Z 2017-07-26T01:57:41Z 2017 Thesis Weng, C. (2017). Discovering class-specific visual patterns for visual recognition. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/72462 10.32657/10356/72462 en 123 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Electrical and electronic engineering |
spellingShingle |
DRNTU::Engineering::Electrical and electronic engineering Weng, Chaoqun Discovering class-specific visual patterns for visual recognition |
description |
Similar to frequent patterns in data mining, visual pattern refers to a recurring composition of
visual contents in images or videos, such as repetitive texture regions, common objects among
images, or similar actions among videos. Such visual patterns capture the recurrence nature
of visual data and can represent the essence of the visual data. Finding such visual patterns is
critical to image and video data analysis.
In spite of the recent successes of unsupervised mining of representative visual patterns in
unlabeled visual data, for visual recognition tasks, the unsupervised mined visual patterns are
often not discriminative enough to distinguish among different classes. One natural way to
overcome this limitation is to leverage supervised learning and discover class-specific visual
patterns, which is the focus of this thesis. Particularly, we target at discovering the following
visual patterns of different structures: (1) class-specific local spatial patterns, e.g., local texture
structure that can help differentiate different object images; (2) class-specific spatial layout
patterns, e.g., spatial grid patterns that can help differentiate different scene images; (3) class-specific
visual pattern of compositional structures, e.g., conjunction (AND) and disjunction
(OR) forms of individual visual features that can help differentiate different scene images and
action videos.
To discover the above-mentioned class-specific visual patterns, this thesis is composed by the
following technical works. In the first work, we propose to mine mid-level visual phrases from
low-level visual primitives, e.g., local image patches or regions, by leveraging local spatial
context of visual primitives, multi-feature fusion of visual primitives, and also the weaklysupervised
image label information. In the second work, we propose to discover class-specific
spatial layouts for each scene category by casting a l1-regularized max-margin optimization
problem. In the third work, we propose a novel branch-and-bound based co-occurrence pattern
mining algorithm that can directly mine both optimal conjunctions (AND) and disjunctions
(OR) of individual features at arbitrary orders simultaneously with minimum classification error for boosting algorithm.
Similar to the third work, in the fourth work we aim to discover highorder
AND/OR patterns of skeleton features from depth camera for action recognition. We also
propose to integrate the discovered AND/OR patterns in an attention LSTM model for temporal
modeling to improve action recognition performance.
Compared with unsupervised visual pattern discovery, which usually separates the step of pattern
discovery and classification, our method can provide a joint learning of visual pattern
discovery and visual recognition. Also, different from conventional visual recognition which
emphasize purely on the classification performance, our class-specific visual patterns target
more on capturing the essence of difference visual classes, such that we not only can recognize
the visual classes, but also can explain and understand why they are different visual classes,
thanks to the discovered class-specific visual patterns. |
author2 |
Yuan Junsong |
author_facet |
Yuan Junsong Weng, Chaoqun |
format |
Theses and Dissertations |
author |
Weng, Chaoqun |
author_sort |
Weng, Chaoqun |
title |
Discovering class-specific visual patterns for visual recognition |
title_short |
Discovering class-specific visual patterns for visual recognition |
title_full |
Discovering class-specific visual patterns for visual recognition |
title_fullStr |
Discovering class-specific visual patterns for visual recognition |
title_full_unstemmed |
Discovering class-specific visual patterns for visual recognition |
title_sort |
discovering class-specific visual patterns for visual recognition |
publishDate |
2017 |
url |
http://hdl.handle.net/10356/72462 |
_version_ |
1772826090291593216 |