Domain adaptation for video action recognition

Humans can effortlessly learn from a specific data distribution and generalize well to various situations without excessive supervision. In contrast, deep learning models often struggle to achieve similar generalization capabilities. This is primarily because deep models are trained with algorithms...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Xiyu
Other Authors: Mao Kezhi
Format: Thesis-Master by Research
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172273
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-172273
record_format dspace
spelling sg-ntu-dr.10356-1722732024-01-04T06:32:51Z Domain adaptation for video action recognition Wang, Xiyu Mao Kezhi School of Electrical and Electronic Engineering EKZMao@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Humans can effortlessly learn from a specific data distribution and generalize well to various situations without excessive supervision. In contrast, deep learning models often struggle to achieve similar generalization capabilities. This is primarily because deep models are trained with algorithms that aim to minimize empirical risks on training data and assume that test data share the same distribution as train data. However, significant domain shifts between training (source) and testing (target) data can occur, causing deep models to generalize poorly on target domains and necessitating additional supervision for adaptation. To address this, Video-based Unsupervised Domain Adaptation (VUDA) has been proposed as a cost-efficient approach for transferring video action recognition models from the source domain to an unlabeled target domain. Nonetheless, VUDA relies on strong assumptions, such as identical label spaces and fixed target domains, which may not hold true in real-world applications. Consequently, this thesis aims to eliminate these assumptions to broaden the applicability of video adaptation methods, focusing on two major shortcomings of conventional VUDA methods, e.g., partial domain adaptation (adapting from a source domain with many classes to a target domain with fewer classes) and continual domain adaptation (adapting to continuously changing target domains). For partial domain adaptation, this thesis proposes the Multi-modality Cluster-calibrated partial Adversarial Network (MCAN), which constructs a multi-modal network to extract robust features and a novel calibration method to refine target class distribution estimation, effectively filtering out irrelevant source classes. To further address some real challenges in the field of adapting deep video models, the problem of continuous video domain adaptation is defined and this thesis proposes Confidence-Attentive network with geneRalization enhanced self-knowledge disTillation (CART). This method leverages attentive learning and a novel data generalization enhanced self-knowledge distillation to preserve previously learned knowledge on seen target domains while adapting to newly encountered ones, ultimately providing a performative model for multiple seen target domains at a minimal cost. This thesis evaluates the proposed partial and continuous video domain adaptation methods on existing and newly constructed benchmarks in this thesis. Our results demonstrated significant performance improvements for MCAN and CART, with MCAN showing particularly strong gains when domain shifts were substantial and CART demonstrating a superior capability of preserving learned knowledge. In conclusion, our research findings on partial and continuous domain adaptation effectively broadened the applicability of video domain adaptation methods, making them more general and cost-efficient. Master of Engineering 2023-12-05T04:43:46Z 2023-12-05T04:43:46Z 2023 Thesis-Master by Research Wang, X. (2023). Domain adaptation for video action recognition. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/172273 https://hdl.handle.net/10356/172273 10.32657/10356/172273 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Wang, Xiyu
Domain adaptation for video action recognition
description Humans can effortlessly learn from a specific data distribution and generalize well to various situations without excessive supervision. In contrast, deep learning models often struggle to achieve similar generalization capabilities. This is primarily because deep models are trained with algorithms that aim to minimize empirical risks on training data and assume that test data share the same distribution as train data. However, significant domain shifts between training (source) and testing (target) data can occur, causing deep models to generalize poorly on target domains and necessitating additional supervision for adaptation. To address this, Video-based Unsupervised Domain Adaptation (VUDA) has been proposed as a cost-efficient approach for transferring video action recognition models from the source domain to an unlabeled target domain. Nonetheless, VUDA relies on strong assumptions, such as identical label spaces and fixed target domains, which may not hold true in real-world applications. Consequently, this thesis aims to eliminate these assumptions to broaden the applicability of video adaptation methods, focusing on two major shortcomings of conventional VUDA methods, e.g., partial domain adaptation (adapting from a source domain with many classes to a target domain with fewer classes) and continual domain adaptation (adapting to continuously changing target domains). For partial domain adaptation, this thesis proposes the Multi-modality Cluster-calibrated partial Adversarial Network (MCAN), which constructs a multi-modal network to extract robust features and a novel calibration method to refine target class distribution estimation, effectively filtering out irrelevant source classes. To further address some real challenges in the field of adapting deep video models, the problem of continuous video domain adaptation is defined and this thesis proposes Confidence-Attentive network with geneRalization enhanced self-knowledge disTillation (CART). This method leverages attentive learning and a novel data generalization enhanced self-knowledge distillation to preserve previously learned knowledge on seen target domains while adapting to newly encountered ones, ultimately providing a performative model for multiple seen target domains at a minimal cost. This thesis evaluates the proposed partial and continuous video domain adaptation methods on existing and newly constructed benchmarks in this thesis. Our results demonstrated significant performance improvements for MCAN and CART, with MCAN showing particularly strong gains when domain shifts were substantial and CART demonstrating a superior capability of preserving learned knowledge. In conclusion, our research findings on partial and continuous domain adaptation effectively broadened the applicability of video domain adaptation methods, making them more general and cost-efficient.
author2 Mao Kezhi
author_facet Mao Kezhi
Wang, Xiyu
format Thesis-Master by Research
author Wang, Xiyu
author_sort Wang, Xiyu
title Domain adaptation for video action recognition
title_short Domain adaptation for video action recognition
title_full Domain adaptation for video action recognition
title_fullStr Domain adaptation for video action recognition
title_full_unstemmed Domain adaptation for video action recognition
title_sort domain adaptation for video action recognition
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/172273
_version_ 1787590719899172864