Domain adaptation for video action recognition
Humans can effortlessly learn from a specific data distribution and generalize well to various situations without excessive supervision. In contrast, deep learning models often struggle to achieve similar generalization capabilities. This is primarily because deep models are trained with algorithms...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Research |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172273 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-172273 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1722732024-01-04T06:32:51Z Domain adaptation for video action recognition Wang, Xiyu Mao Kezhi School of Electrical and Electronic Engineering EKZMao@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Humans can effortlessly learn from a specific data distribution and generalize well to various situations without excessive supervision. In contrast, deep learning models often struggle to achieve similar generalization capabilities. This is primarily because deep models are trained with algorithms that aim to minimize empirical risks on training data and assume that test data share the same distribution as train data. However, significant domain shifts between training (source) and testing (target) data can occur, causing deep models to generalize poorly on target domains and necessitating additional supervision for adaptation. To address this, Video-based Unsupervised Domain Adaptation (VUDA) has been proposed as a cost-efficient approach for transferring video action recognition models from the source domain to an unlabeled target domain. Nonetheless, VUDA relies on strong assumptions, such as identical label spaces and fixed target domains, which may not hold true in real-world applications. Consequently, this thesis aims to eliminate these assumptions to broaden the applicability of video adaptation methods, focusing on two major shortcomings of conventional VUDA methods, e.g., partial domain adaptation (adapting from a source domain with many classes to a target domain with fewer classes) and continual domain adaptation (adapting to continuously changing target domains). For partial domain adaptation, this thesis proposes the Multi-modality Cluster-calibrated partial Adversarial Network (MCAN), which constructs a multi-modal network to extract robust features and a novel calibration method to refine target class distribution estimation, effectively filtering out irrelevant source classes. To further address some real challenges in the field of adapting deep video models, the problem of continuous video domain adaptation is defined and this thesis proposes Confidence-Attentive network with geneRalization enhanced self-knowledge disTillation (CART). This method leverages attentive learning and a novel data generalization enhanced self-knowledge distillation to preserve previously learned knowledge on seen target domains while adapting to newly encountered ones, ultimately providing a performative model for multiple seen target domains at a minimal cost. This thesis evaluates the proposed partial and continuous video domain adaptation methods on existing and newly constructed benchmarks in this thesis. Our results demonstrated significant performance improvements for MCAN and CART, with MCAN showing particularly strong gains when domain shifts were substantial and CART demonstrating a superior capability of preserving learned knowledge. In conclusion, our research findings on partial and continuous domain adaptation effectively broadened the applicability of video domain adaptation methods, making them more general and cost-efficient. Master of Engineering 2023-12-05T04:43:46Z 2023-12-05T04:43:46Z 2023 Thesis-Master by Research Wang, X. (2023). Domain adaptation for video action recognition. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/172273 https://hdl.handle.net/10356/172273 10.32657/10356/172273 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Pattern recognition |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Wang, Xiyu Domain adaptation for video action recognition |
description |
Humans can effortlessly learn from a specific data distribution and generalize well to various situations without excessive supervision. In contrast, deep learning models often struggle to achieve similar generalization capabilities. This is primarily because deep models are trained with algorithms that aim to minimize empirical risks on training data and assume that test data share the same distribution as train data. However, significant domain shifts between training (source) and testing (target) data can occur, causing deep models to generalize poorly on target domains and necessitating additional supervision for adaptation.
To address this, Video-based Unsupervised Domain Adaptation (VUDA) has been proposed as a cost-efficient approach for transferring video action recognition models from the source domain to an unlabeled target domain. Nonetheless, VUDA relies on strong assumptions, such as identical label spaces and fixed target domains, which may not hold true in real-world applications. Consequently, this thesis aims to eliminate these assumptions to broaden the applicability of video adaptation methods, focusing on two major shortcomings of conventional VUDA methods, e.g., partial domain adaptation (adapting from a source domain with many classes to a target domain with fewer classes) and continual domain adaptation (adapting to continuously changing target domains).
For partial domain adaptation, this thesis proposes the Multi-modality Cluster-calibrated partial Adversarial Network (MCAN), which constructs a multi-modal network to extract robust features and a novel calibration method to refine target class distribution estimation, effectively filtering out irrelevant source classes. To further address some real challenges in the field of adapting deep video models, the problem of continuous video domain adaptation is defined and this thesis proposes Confidence-Attentive network with geneRalization enhanced self-knowledge disTillation (CART). This method leverages attentive learning and a novel data generalization enhanced self-knowledge distillation to preserve previously learned knowledge on seen target domains while adapting to newly encountered ones, ultimately providing a performative model for multiple seen target domains at a minimal cost.
This thesis evaluates the proposed partial and continuous video domain adaptation methods on existing and newly constructed benchmarks in this thesis. Our results demonstrated significant performance improvements for MCAN and CART, with MCAN showing particularly strong gains when domain shifts were substantial and CART demonstrating a superior capability of preserving learned knowledge. In conclusion, our research findings on partial and continuous domain adaptation effectively broadened the applicability of video domain adaptation methods, making them more general and cost-efficient. |
author2 |
Mao Kezhi |
author_facet |
Mao Kezhi Wang, Xiyu |
format |
Thesis-Master by Research |
author |
Wang, Xiyu |
author_sort |
Wang, Xiyu |
title |
Domain adaptation for video action recognition |
title_short |
Domain adaptation for video action recognition |
title_full |
Domain adaptation for video action recognition |
title_fullStr |
Domain adaptation for video action recognition |
title_full_unstemmed |
Domain adaptation for video action recognition |
title_sort |
domain adaptation for video action recognition |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/172273 |
_version_ |
1787590719899172864 |