Backdoor in deep learning: new threats and opportunities

Deep learning has become increasingly popular due to its remarkable ability to learn high-dimensional feature representations. Numerous algorithms and models have been developed to enhance the application of deep learning across various real-world tasks, including image classification, natural langu...

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Kangjie
Other Authors: Zhang Tianwei
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182221
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Deep learning has become increasingly popular due to its remarkable ability to learn high-dimensional feature representations. Numerous algorithms and models have been developed to enhance the application of deep learning across various real-world tasks, including image classification, natural language processing, and autonomous driving. However, deep learning models are susceptible to backdoor threats, where an attacker manipulates the training process or data to cause incorrect predictions on malicious samples containing specific triggers, while maintaining normal performance on benign samples. With the advancement of deep learning, including evolving training schemes and the need for large-scale training data, new threats in the backdoor domain continue to emerge. Conversely, backdoors can also be leveraged to protect deep learning models, such as through watermarking techniques. In this thesis, we conduct an in-depth investigation into backdoor techniques from three novel perspectives. In the first part of this thesis, we demonstrate that emerging deep learning training schemes can introduce new backdoor risks. Specifically, pre-trained Natural Language Processing (NLP) models can be easily adapted to a variety of downstream language tasks, significantly accelerating the development of language models. However, the pre-trained model becomes a single point of failure for these downstream models. We propose a novel task-agnostic backdoor attack against pre-trained NLP models, wherein the adversary does not need prior information about the downstream tasks when implanting the backdoor into the pre-trained model. Any downstream models transferred from this malicious model will inherit the backdoor, even after extensive transfer learning, revealing the severe vulnerability of pre-trained foundation models to backdoor attacks. In the second part of this thesis, we develop novel backdoor attack methods suited to new threat scenarios. The rapid expansion of deep learning models necessitates large-scale training data, much of which is unlabeled and outsourced to third parties for annotation. To ensure data security, most datasets are read-only for training samples, preventing the addition of input triggers. Consequently, attackers can only achieve data poisoning by uploading malicious annotations. In this practical scenario, all existing data poisoning methods that add triggers to the input are infeasible. Therefore, we propose new backdoor attack methods that involve poisoning only the labels without modifying any input samples. In the third part of this thesis, we utilize the backdoor technique to proactively protect our deep learning models, specifically for intellectual property protection. Considering the complexity of deep learning tasks, generating a well-trained deep learning model requires substantial computational resources, training data, and expertise. Therefore, it is essential to protect these assets and prevent copyright infringement. Inspired by backdoor attacks that can induce specific behaviors in target models through carefully designed samples, several watermarking methods have been proposed to protect the intellectual property of deep learning models. Model owners can train their models to produce unique outputs for certain crafted samples and use these samples for ownership verification. While various extraction techniques have been designed for supervised deep learning models, challenges arise when applying them to deep reinforcement learning models due to differences in model features and scenarios. Therefore, we propose a novel watermarking scheme to protect deep reinforcement learning models from unauthorized distribution. Instead of using spatial watermarks as in conventional deep learning models, we design temporal watermarks that minimize potential impact and damage to the protected deep reinforcement learning model while achieving high-fidelity ownership verification. In summary, this thesis investigates the evolving landscape of backdoor threats during the development of deep learning techniques and the use of backdoors for beneficial purposes in intellectual property protection.