Discovering visual patterns for image data analysis

Discovering meaningful patterns in image data can help to better understand and search the visual contents that convey rich information. Unlike conventional pattern mining in transaction or text data, visual patterns usually contain complex spatial structures and exhibit large variations, thus makin...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Hongxing
Other Authors: Tan Yap Peng
Format: Theses and Dissertations
Language:English
Published: 2015
Subjects:
Online Access:https://hdl.handle.net/10356/62511
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Discovering meaningful patterns in image data can help to better understand and search the visual contents that convey rich information. Unlike conventional pattern mining in transaction or text data, visual patterns usually contain complex spatial structures and exhibit large variations, thus making the image data mining task very challenging. This thesis is devoted to developing effective methods for visual pattern discovery, and has the following three contributions. The first contribution of this thesis is on spatial visual pattern discovery. We focus on the composition of visual patterns from visual primitives (e.g., local interest points or regions in images). Conventional clustering of visual primitives, e.g., bag-of-words, usually ignores the spatial and feature structure among the visual primitives, thus cannot discover high-level visual patterns of complex structure. To overcome this problem, we propose to consider both spatial and feature contexts among visual primitives for spatial pattern discovery. We formulate the pattern discovery problem as a regularized k-means clustering, and propose a self-learning procedure to gradually refine the result until it converges. Our experiments validate that by discovering both spatial co-occurrence patterns among visual primitives and feature co-occurrence patterns among different types of features, our method can effectively reduce the ambiguities of visual primitives. The second contribution of this thesis is on multi-feature clustering for visual pattern discovery. Given multiple-feature representation for each image, we propose a novel minimax formulation to reach a consensus clustering, without requiring to specify the weighting parameter to fuse the multiple feature modalities. Our objective of consensus clustering is to find a universal feature embedding, which not only fits each feature modality well, but also unifies different modalities by minimizing the pairwise disagreement between any two of them. The experiments with real image data show the advantages of the proposed multi-feature clustering method when compared with existing methods. The last contribution of this thesis is on semi-supervised visual pattern discovery. We study visual pattern classification using collaborative multi-feature fusion in a transductive learning framework, where the labelled data can transfer the labels to the unlabelled data. To enable transductive spectral learning, we formulate a new objective function with three objectives, namely the good quality of spectral clustering in individual feature types, the label smoothness of data samples in terms of their feature co-occurrence representations, and the fitness to the labels provided by the training data. However, the spectral clustering results in different feature types and the formed co-occurrence patterns influence each other under the transductive learning formulation, which makes the optimization of this objective function challenging. To address this problem, we propose an iterative optimization approach that can decouple these factors. During the iterations, the clustering results of individual feature types and the smoothness of the labelling of data samples will help each other, leading to a better transductive learning and a pattern classification result. The effectiveness of the proposed method is validated by both synthetic and real image data.