Self-supervised learning disentangled group representation as feature
A good visual representation is an inference map from observations (images) to features (vectors) that faithfully reflects the hidden modularized generative factors (semantics). In this paper, we formulate the notion of “good” representation from a group-theoretic view using Higgins’ definition of d...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2021
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/6227 https://ink.library.smu.edu.sg/context/sis_research/article/7230/viewcontent/NIPS2021_Causal_SSL_final.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-7230 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-72302021-10-22T05:58:16Z Self-supervised learning disentangled group representation as feature WANG, Tan YUE, Zhongqi HUANG, Jianqiang SUN, Qianru ZHANG, Hanwang A good visual representation is an inference map from observations (images) to features (vectors) that faithfully reflects the hidden modularized generative factors (semantics). In this paper, we formulate the notion of “good” representation from a group-theoretic view using Higgins’ definition of disentangled representation [38], and show that existing Self-Supervised Learning (SSL) only disentangles simple augmentation features such as rotation and colorization, thus unable to modularize the remaining semantics. To break the limitation, we propose an iterative SSL algorithm: Iterative Partition-based Invariant Risk Minimization (IP-IRM), which successfully grounds the abstract semantics and the group acting on them into concrete contrastive learning. At each iteration, IP-IRM first partitions the training samples into two subsets that correspond to an entangled group element. Then, it minimizes a subset-invariant contrastive loss, where the invariance guarantees to disentangle the group element. We prove that IP-IRM converges to a fully disentangled representation and show its effectiveness on various benchmarks. 2021-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6227 https://ink.library.smu.edu.sg/context/sis_research/article/7230/viewcontent/NIPS2021_Causal_SSL_final.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Graphics and Human Computer Interfaces |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems Graphics and Human Computer Interfaces |
spellingShingle |
Databases and Information Systems Graphics and Human Computer Interfaces WANG, Tan YUE, Zhongqi HUANG, Jianqiang SUN, Qianru ZHANG, Hanwang Self-supervised learning disentangled group representation as feature |
description |
A good visual representation is an inference map from observations (images) to features (vectors) that faithfully reflects the hidden modularized generative factors (semantics). In this paper, we formulate the notion of “good” representation from a group-theoretic view using Higgins’ definition of disentangled representation [38], and show that existing Self-Supervised Learning (SSL) only disentangles simple augmentation features such as rotation and colorization, thus unable to modularize the remaining semantics. To break the limitation, we propose an iterative SSL algorithm: Iterative Partition-based Invariant Risk Minimization (IP-IRM), which successfully grounds the abstract semantics and the group acting on them into concrete contrastive learning. At each iteration, IP-IRM first partitions the training samples into two subsets that correspond to an entangled group element. Then, it minimizes a subset-invariant contrastive loss, where the invariance guarantees to disentangle the group element. We prove that IP-IRM converges to a fully disentangled representation and show its effectiveness on various benchmarks. |
format |
text |
author |
WANG, Tan YUE, Zhongqi HUANG, Jianqiang SUN, Qianru ZHANG, Hanwang |
author_facet |
WANG, Tan YUE, Zhongqi HUANG, Jianqiang SUN, Qianru ZHANG, Hanwang |
author_sort |
WANG, Tan |
title |
Self-supervised learning disentangled group representation as feature |
title_short |
Self-supervised learning disentangled group representation as feature |
title_full |
Self-supervised learning disentangled group representation as feature |
title_fullStr |
Self-supervised learning disentangled group representation as feature |
title_full_unstemmed |
Self-supervised learning disentangled group representation as feature |
title_sort |
self-supervised learning disentangled group representation as feature |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2021 |
url |
https://ink.library.smu.edu.sg/sis_research/6227 https://ink.library.smu.edu.sg/context/sis_research/article/7230/viewcontent/NIPS2021_Causal_SSL_final.pdf |
_version_ |
1770575895293067264 |