Disentangled image representation: from affine transforms to facial attributes

Deep learning has shown unprecedented performance on computer vision tasks in recent years. One of the foundations of deep learning is the large datasets with human annotations. However, the datasets with human annotations are born with natural drawbacks. First, the cost of human annotations is e...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Liu, Letao
مؤلفون آخرون:	Jiang Xudong
التنسيق:	Thesis-Doctor of Philosophy
اللغة:	English
منشور في:	Nanyang Technological University 2023
الموضوعات:	Engineering::Electrical and electronic engineering::Computer hardware, software and systems
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/166053
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-166053
record_format	dspace
spelling	sg-ntu-dr.10356-1660532023-07-04T15:19:19Z Disentangled image representation: from affine transforms to facial attributes Liu, Letao Jiang Xudong School of Electrical and Electronic Engineering EXDJiang@ntu.edu.sg Engineering::Electrical and electronic engineering::Computer hardware, software and systems Deep learning has shown unprecedented performance on computer vision tasks in recent years. One of the foundations of deep learning is the large datasets with human annotations. However, the datasets with human annotations are born with natural drawbacks. First, the cost of human annotations is expensive, especially with tasks such as segmentation. Next, the annotation itself may not be correct, which could be due to the subjective nature of the problem. Last but not least, if we wish the algorithm to evolve in real-world scenarios, it is not possible to keep annotating all the surrounding objects in real-time. To better utilize the algorithm in real-world scenarios, we want to deploy deep learning with minimal human annotation, for example, in an unsupervised or self supervised manner. To be more specific, we tackle this problem from the perspec tive of generative models and disentangled representation. With generative mod els, the outputs of the model can be visualized. With disentangled representation, different attributes learned by the model can be separated. The combination of those two approaches provides a pathway to aligning the visualized attributes with human instincts. To learn the disentangled representation in an unsupervised or self-supervised manner, we tackle this problem from the perspective of contrastive learning and inductive bias. With contrastive learning, we can produce more data samples by transforming the original data and comparing the differences between them. With inductive bias, we can formulate a meaningful relationship between the transformed and original data sample pairs. In this thesis, we demonstrate the effectiveness of inductive bias such as affine transforms and facial attributes. In summary, the thesis contributes to the disentangled image representation, which provides a pathway for us to understand the output of the generative model in a more vivid manner by visualizing the results and aligning with human intuition. Doctor of Philosophy 2023-04-16T05:16:18Z 2023-04-16T05:16:18Z 2023 Thesis-Doctor of Philosophy Liu, L. (2023). Disentangled image representation: from affine transforms to facial attributes. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/166053 https://hdl.handle.net/10356/166053 10.32657/10356/166053 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle	Engineering::Electrical and electronic engineering::Computer hardware, software and systems Liu, Letao Disentangled image representation: from affine transforms to facial attributes
description	Deep learning has shown unprecedented performance on computer vision tasks in recent years. One of the foundations of deep learning is the large datasets with human annotations. However, the datasets with human annotations are born with natural drawbacks. First, the cost of human annotations is expensive, especially with tasks such as segmentation. Next, the annotation itself may not be correct, which could be due to the subjective nature of the problem. Last but not least, if we wish the algorithm to evolve in real-world scenarios, it is not possible to keep annotating all the surrounding objects in real-time. To better utilize the algorithm in real-world scenarios, we want to deploy deep learning with minimal human annotation, for example, in an unsupervised or self supervised manner. To be more specific, we tackle this problem from the perspec tive of generative models and disentangled representation. With generative mod els, the outputs of the model can be visualized. With disentangled representation, different attributes learned by the model can be separated. The combination of those two approaches provides a pathway to aligning the visualized attributes with human instincts. To learn the disentangled representation in an unsupervised or self-supervised manner, we tackle this problem from the perspective of contrastive learning and inductive bias. With contrastive learning, we can produce more data samples by transforming the original data and comparing the differences between them. With inductive bias, we can formulate a meaningful relationship between the transformed and original data sample pairs. In this thesis, we demonstrate the effectiveness of inductive bias such as affine transforms and facial attributes. In summary, the thesis contributes to the disentangled image representation, which provides a pathway for us to understand the output of the generative model in a more vivid manner by visualizing the results and aligning with human intuition.
author2	Jiang Xudong
author_facet	Jiang Xudong Liu, Letao
format	Thesis-Doctor of Philosophy
author	Liu, Letao
author_sort	Liu, Letao
title	Disentangled image representation: from affine transforms to facial attributes
title_short	Disentangled image representation: from affine transforms to facial attributes
title_full	Disentangled image representation: from affine transforms to facial attributes
title_fullStr	Disentangled image representation: from affine transforms to facial attributes
title_full_unstemmed	Disentangled image representation: from affine transforms to facial attributes
title_sort	disentangled image representation: from affine transforms to facial attributes
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/166053
_version_	1772825550074675200

Disentangled image representation: from affine transforms to facial attributes

مواد مشابهة