Recurrent affine transform encoder for image representation
This paper proposes a Recurrent Affine Transform Encoder (RATE) that can be used for image representation learning. We propose a learning architecture that enables a CNN encoder to learn the affine transform parameter of images. The proposed learning architecture decomposes an affine transform matri...
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/164994 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-164994 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1649942023-03-10T15:40:07Z Recurrent affine transform encoder for image representation Liu, Letao Jiang, Xudong Saerbeck, Martin Dauwels, Justin School of Electrical and Electronic Engineering Engineering::Electrical and electronic engineering Canonical Image Base Self-Supervised Learning This paper proposes a Recurrent Affine Transform Encoder (RATE) that can be used for image representation learning. We propose a learning architecture that enables a CNN encoder to learn the affine transform parameter of images. The proposed learning architecture decomposes an affine transform matrix into two transform matrices and learns them jointly in a self-supervised manner. The proposed RATE is trained by unlabeled image data without any ground truth and infers the affine transform parameter of input images recurrently. The inferred affine transform parameter can be used to represent images in canonical form to greatly reduce the image variations in affine transforms such as rotation, scaling, and translation. Different from the spatial transformer network, the proposed RATE does not need to be embedded into other networks for training with the aid of other learning objectives. We show that the proposed RATE learns the affine transform parameter of images and achieves impressive image representation results in terms of invariance to translation, scaling, and rotation. We also show that the classification performance is enhanced and is more robust against distortion by incorporating the RATE into the existing classification model. Economic Development Board (EDB) Published version This work was supported in part by the Singapore Economic Development Board Industrial Postgraduate Program under Grant S17-1298-IPP-II. 2023-03-07T02:41:01Z 2023-03-07T02:41:01Z 2022 Journal Article Liu, L., Jiang, X., Saerbeck, M. & Dauwels, J. (2022). Recurrent affine transform encoder for image representation. IEEE Access, 10, 18653-18666. https://dx.doi.org/10.1109/ACCESS.2022.3150340 2169-3536 https://hdl.handle.net/10356/164994 10.1109/ACCESS.2022.3150340 2-s2.0-85124756125 10 18653 18666 en S17-1298-IPP-II IEEE Access © 2022 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering Canonical Image Base Self-Supervised Learning |
spellingShingle |
Engineering::Electrical and electronic engineering Canonical Image Base Self-Supervised Learning Liu, Letao Jiang, Xudong Saerbeck, Martin Dauwels, Justin Recurrent affine transform encoder for image representation |
description |
This paper proposes a Recurrent Affine Transform Encoder (RATE) that can be used for image representation learning. We propose a learning architecture that enables a CNN encoder to learn the affine transform parameter of images. The proposed learning architecture decomposes an affine transform matrix into two transform matrices and learns them jointly in a self-supervised manner. The proposed RATE is trained by unlabeled image data without any ground truth and infers the affine transform parameter of input images recurrently. The inferred affine transform parameter can be used to represent images in canonical form to greatly reduce the image variations in affine transforms such as rotation, scaling, and translation. Different from the spatial transformer network, the proposed RATE does not need to be embedded into other networks for training with the aid of other learning objectives. We show that the proposed RATE learns the affine transform parameter of images and achieves impressive image representation results in terms of invariance to translation, scaling, and rotation. We also show that the classification performance is enhanced and is more robust against distortion by incorporating the RATE into the existing classification model. |
author2 |
School of Electrical and Electronic Engineering |
author_facet |
School of Electrical and Electronic Engineering Liu, Letao Jiang, Xudong Saerbeck, Martin Dauwels, Justin |
format |
Article |
author |
Liu, Letao Jiang, Xudong Saerbeck, Martin Dauwels, Justin |
author_sort |
Liu, Letao |
title |
Recurrent affine transform encoder for image representation |
title_short |
Recurrent affine transform encoder for image representation |
title_full |
Recurrent affine transform encoder for image representation |
title_fullStr |
Recurrent affine transform encoder for image representation |
title_full_unstemmed |
Recurrent affine transform encoder for image representation |
title_sort |
recurrent affine transform encoder for image representation |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/164994 |
_version_ |
1761781318507560960 |