Fingerprinting deep neural networks - a DeepFool approach

A well-trained deep learning classifier is an expensive intellectual property of the model owner. However, recently proposed model extraction attacks and reverse engineering techniques make model theft possible and similar quality deep learning solution reproducible at a low cost. To protect the int...

Full description

Saved in:
Bibliographic Details
Main Authors: Wang, Si, Chang, Chip Hong
Other Authors: School of Electrical and Electronic Engineering
Format: Conference or Workshop Item
Language:English
Published: 2021
Subjects:
Online Access:https://hdl.handle.net/10356/147023
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:A well-trained deep learning classifier is an expensive intellectual property of the model owner. However, recently proposed model extraction attacks and reverse engineering techniques make model theft possible and similar quality deep learning solution reproducible at a low cost. To protect the interest and revenue of the model owner, watermarking on Deep Neural Network (DNN) has been proposed. However, the extra components and computations due to the embedded watermark tend to interfere with the model training process and result in inevitable degradation in classification accuracy. In this paper, we utilize the geometry characteristics inherited in the DeepFool algorithm to extract data points near the classification boundary of the target model for ownership verification. As the fingerprint is extracted after the training process has been completed, the original achievable classification accuracy will not be compromised. This countermeasure is founded on the hypothesis that different models possess different classification boundaries determined solely by the hyperparameters of the DNN and the training it has undergone. Therefore, given a set of fingerprint data points, a pirated model or its post-processed version will produce similar prediction but another originally designed and trained DNN for the same task will produce very different prediction even if they have similar or better classification accuracy. The effectiveness of the proposed Intellectual Property (IP) protection method is validated on the CIFAR-10, CIFAR-100 and ImageNet datasets. The results show a detection rate of 100% and a false positive rate of 0% for each dataset. More importantly, the fingerprint extraction and its runtime are both dataset independent. It is on average ∼130× faster than two state-of-the-art fingerprinting methods.