DeepStellar: Model-based quantitative analysis of stateful deep learning systems

Deep Learning (DL) has achieved tremendous success in many cutting-edge applications. However, the state-of-the-art DL systems still suffer from quality issues. While some recent progress has been made on the analysis of feed-forward DL systems, little study has been done on the Recurrent Neural Net...

Full description

Saved in:
Bibliographic Details
Main Authors: DU, Xiaoning, XIE, Xiaofei, LI, Yi, MA, Lei, LIU, Yang, ZHAO, Jianjun
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2019
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7068
https://ink.library.smu.edu.sg/context/sis_research/article/8071/viewcontent/3338906.3338954.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Deep Learning (DL) has achieved tremendous success in many cutting-edge applications. However, the state-of-the-art DL systems still suffer from quality issues. While some recent progress has been made on the analysis of feed-forward DL systems, little study has been done on the Recurrent Neural Network (RNN)-based stateful DL systems, which are widely used in audio, natural languages and video processing, etc. In this paper, we initiate the very first step towards the quantitative analysis of RNN-based DL systems. We model RNN as an abstract state transition system to characterize its internal behaviors. Based on the abstract model, we design two trace similarity metrics and five coverage criteria which enable the quantitative analysis of RNNs. We further propose two algorithms powered by the quantitative measures for adversarial sample detection and coverage-guided test generation. We evaluate DeepStellar on four RNN-based systems covering image classification and automated speech recognition. The results demonstrate that the abstract model is useful in capturing the internal behaviors of RNNs, and confirm that (1) the similarity metrics could effectively capture the differences between samples even with very small perturbations (achieving 97% accuracy for detecting adversarial samples) and (2) the coverage criteria are useful in revealing erroneous behaviors (generating three times more adversarial samples than random testing and hundreds times more than the unrolling approach).