Discovering class-specific visual patterns for visual recognition

Similar to frequent patterns in data mining, visual pattern refers to a recurring composition of visual contents in images or videos, such as repetitive texture regions, common objects among images, or similar actions among videos. Such visual patterns capture the recurrence nature of visual data...

Full description

Saved in:
Bibliographic Details
Main Author: Weng, Chaoqun
Other Authors: Yuan Junsong
Format: Theses and Dissertations
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/72462
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-72462
record_format dspace
spelling sg-ntu-dr.10356-724622023-07-04T17:09:52Z Discovering class-specific visual patterns for visual recognition Weng, Chaoqun Yuan Junsong School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Similar to frequent patterns in data mining, visual pattern refers to a recurring composition of visual contents in images or videos, such as repetitive texture regions, common objects among images, or similar actions among videos. Such visual patterns capture the recurrence nature of visual data and can represent the essence of the visual data. Finding such visual patterns is critical to image and video data analysis. In spite of the recent successes of unsupervised mining of representative visual patterns in unlabeled visual data, for visual recognition tasks, the unsupervised mined visual patterns are often not discriminative enough to distinguish among different classes. One natural way to overcome this limitation is to leverage supervised learning and discover class-specific visual patterns, which is the focus of this thesis. Particularly, we target at discovering the following visual patterns of different structures: (1) class-specific local spatial patterns, e.g., local texture structure that can help differentiate different object images; (2) class-specific spatial layout patterns, e.g., spatial grid patterns that can help differentiate different scene images; (3) class-specific visual pattern of compositional structures, e.g., conjunction (AND) and disjunction (OR) forms of individual visual features that can help differentiate different scene images and action videos. To discover the above-mentioned class-specific visual patterns, this thesis is composed by the following technical works. In the first work, we propose to mine mid-level visual phrases from low-level visual primitives, e.g., local image patches or regions, by leveraging local spatial context of visual primitives, multi-feature fusion of visual primitives, and also the weaklysupervised image label information. In the second work, we propose to discover class-specific spatial layouts for each scene category by casting a l1-regularized max-margin optimization problem. In the third work, we propose a novel branch-and-bound based co-occurrence pattern mining algorithm that can directly mine both optimal conjunctions (AND) and disjunctions (OR) of individual features at arbitrary orders simultaneously with minimum classification error for boosting algorithm. Similar to the third work, in the fourth work we aim to discover highorder AND/OR patterns of skeleton features from depth camera for action recognition. We also propose to integrate the discovered AND/OR patterns in an attention LSTM model for temporal modeling to improve action recognition performance. Compared with unsupervised visual pattern discovery, which usually separates the step of pattern discovery and classification, our method can provide a joint learning of visual pattern discovery and visual recognition. Also, different from conventional visual recognition which emphasize purely on the classification performance, our class-specific visual patterns target more on capturing the essence of difference visual classes, such that we not only can recognize the visual classes, but also can explain and understand why they are different visual classes, thanks to the discovered class-specific visual patterns. Doctor of Philosophy (EEE) 2017-07-26T01:57:41Z 2017-07-26T01:57:41Z 2017 Thesis Weng, C. (2017). Discovering class-specific visual patterns for visual recognition. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/72462 10.32657/10356/72462 en 123 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering
spellingShingle DRNTU::Engineering::Electrical and electronic engineering
Weng, Chaoqun
Discovering class-specific visual patterns for visual recognition
description Similar to frequent patterns in data mining, visual pattern refers to a recurring composition of visual contents in images or videos, such as repetitive texture regions, common objects among images, or similar actions among videos. Such visual patterns capture the recurrence nature of visual data and can represent the essence of the visual data. Finding such visual patterns is critical to image and video data analysis. In spite of the recent successes of unsupervised mining of representative visual patterns in unlabeled visual data, for visual recognition tasks, the unsupervised mined visual patterns are often not discriminative enough to distinguish among different classes. One natural way to overcome this limitation is to leverage supervised learning and discover class-specific visual patterns, which is the focus of this thesis. Particularly, we target at discovering the following visual patterns of different structures: (1) class-specific local spatial patterns, e.g., local texture structure that can help differentiate different object images; (2) class-specific spatial layout patterns, e.g., spatial grid patterns that can help differentiate different scene images; (3) class-specific visual pattern of compositional structures, e.g., conjunction (AND) and disjunction (OR) forms of individual visual features that can help differentiate different scene images and action videos. To discover the above-mentioned class-specific visual patterns, this thesis is composed by the following technical works. In the first work, we propose to mine mid-level visual phrases from low-level visual primitives, e.g., local image patches or regions, by leveraging local spatial context of visual primitives, multi-feature fusion of visual primitives, and also the weaklysupervised image label information. In the second work, we propose to discover class-specific spatial layouts for each scene category by casting a l1-regularized max-margin optimization problem. In the third work, we propose a novel branch-and-bound based co-occurrence pattern mining algorithm that can directly mine both optimal conjunctions (AND) and disjunctions (OR) of individual features at arbitrary orders simultaneously with minimum classification error for boosting algorithm. Similar to the third work, in the fourth work we aim to discover highorder AND/OR patterns of skeleton features from depth camera for action recognition. We also propose to integrate the discovered AND/OR patterns in an attention LSTM model for temporal modeling to improve action recognition performance. Compared with unsupervised visual pattern discovery, which usually separates the step of pattern discovery and classification, our method can provide a joint learning of visual pattern discovery and visual recognition. Also, different from conventional visual recognition which emphasize purely on the classification performance, our class-specific visual patterns target more on capturing the essence of difference visual classes, such that we not only can recognize the visual classes, but also can explain and understand why they are different visual classes, thanks to the discovered class-specific visual patterns.
author2 Yuan Junsong
author_facet Yuan Junsong
Weng, Chaoqun
format Theses and Dissertations
author Weng, Chaoqun
author_sort Weng, Chaoqun
title Discovering class-specific visual patterns for visual recognition
title_short Discovering class-specific visual patterns for visual recognition
title_full Discovering class-specific visual patterns for visual recognition
title_fullStr Discovering class-specific visual patterns for visual recognition
title_full_unstemmed Discovering class-specific visual patterns for visual recognition
title_sort discovering class-specific visual patterns for visual recognition
publishDate 2017
url http://hdl.handle.net/10356/72462
_version_ 1772826090291593216