CrossASR: Efficient differential testing of automatic speech recognition via text-to-speech

Automatic speech recognition (ASR) systems are ubiquitous parts of modern life. It can be found in our smartphones, desktops, and smart home systems. To ensure its correctness in recognizing speeches, ASR needs to be tested. Testing ASR requires test cases in the form of audio files and their transc...

Full description

Saved in:

Bibliographic Details
Main Authors:	ASYROFI, Muhammad Hilmi, Ferdian, Thung, LO, David, JIANG, Lingxiao
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2020
Subjects:	Automatic Speech Recognition Differential Testing Failure Probability Predictor Test Case Generation Text-to-Speech Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/5536 https://ink.library.smu.edu.sg/context/sis_research/article/6539/viewcontent/icsme20crossASR.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-6539
record_format	dspace
spelling	sg-smu-ink.sis_research-65392021-05-18T02:06:01Z CrossASR: Efficient differential testing of automatic speech recognition via text-to-speech ASYROFI, Muhammad Hilmi Ferdian, Thung LO, David JIANG, Lingxiao Automatic speech recognition (ASR) systems are ubiquitous parts of modern life. It can be found in our smartphones, desktops, and smart home systems. To ensure its correctness in recognizing speeches, ASR needs to be tested. Testing ASR requires test cases in the form of audio files and their transcribed texts. Building these test cases manually, however, is tedious and time-consuming.To deal with the aforementioned challenge, in this work, we propose CrossASR, an approach that capitalizes the existing Text-To-Speech (TTS) systems to automatically generate test cases for ASR systems. CrossASR is a differential testing solution that compares outputs of multiple ASR systems to uncover erroneous behaviors among ASRs. CrossASR efficiently generates test cases to uncover failures with as few generated tests as possible; it does so by employing a failure probability predictor to pick the texts with the highest likelihood of leading to failed test cases. As a black-box approach, CrossASR can generate test cases for any ASR, including when the ASR model is not available (e.g., when evaluating the reliability of various third-party ASR services).We evaluated CrossASR using 4 TTSes and 4 ASRs on the Europarl corpus. The experimented ASRs are Deepspeech, Deepspeech2, wav2letter, and wit. Our experiments on a randomly sampled 20,000 English texts showed that within an hour, CrossASR can produce, on average from 3 experiments, 130.34, 123.33, 47.33, and 8.66 failed test cases using Google, Respon-siveVoice, Festival, and Espeak TTSes, respectively. Moreover, when we run CrossASR on the entire 20,000 texts, it can generate 13,572, 13,071, 5,911, and 1,064 failed test cases using Google, ResponsiveVoice, Festival, and Espeak TTSes, respectively. Based on a manual verification carried out on statistically representative sample size, we found that most samples are actual failed test cases (audio understandable to humans but cannot be transcribed properly by an ASR), demonstrating that CrossASR is highly reliable in determining failed test cases. We also make the source code for CrossASR and evaluation data available at https://github.com/soarsμCrossASR. 2020-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5536 info:doi/10.1109/ICSME46990.2020.00066 https://ink.library.smu.edu.sg/context/sis_research/article/6539/viewcontent/icsme20crossASR.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Automatic Speech Recognition Differential Testing Failure Probability Predictor Test Case Generation Text-to-Speech Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Automatic Speech Recognition Differential Testing Failure Probability Predictor Test Case Generation Text-to-Speech Software Engineering
spellingShingle	Automatic Speech Recognition Differential Testing Failure Probability Predictor Test Case Generation Text-to-Speech Software Engineering ASYROFI, Muhammad Hilmi Ferdian, Thung LO, David JIANG, Lingxiao CrossASR: Efficient differential testing of automatic speech recognition via text-to-speech
description	Automatic speech recognition (ASR) systems are ubiquitous parts of modern life. It can be found in our smartphones, desktops, and smart home systems. To ensure its correctness in recognizing speeches, ASR needs to be tested. Testing ASR requires test cases in the form of audio files and their transcribed texts. Building these test cases manually, however, is tedious and time-consuming.To deal with the aforementioned challenge, in this work, we propose CrossASR, an approach that capitalizes the existing Text-To-Speech (TTS) systems to automatically generate test cases for ASR systems. CrossASR is a differential testing solution that compares outputs of multiple ASR systems to uncover erroneous behaviors among ASRs. CrossASR efficiently generates test cases to uncover failures with as few generated tests as possible; it does so by employing a failure probability predictor to pick the texts with the highest likelihood of leading to failed test cases. As a black-box approach, CrossASR can generate test cases for any ASR, including when the ASR model is not available (e.g., when evaluating the reliability of various third-party ASR services).We evaluated CrossASR using 4 TTSes and 4 ASRs on the Europarl corpus. The experimented ASRs are Deepspeech, Deepspeech2, wav2letter, and wit. Our experiments on a randomly sampled 20,000 English texts showed that within an hour, CrossASR can produce, on average from 3 experiments, 130.34, 123.33, 47.33, and 8.66 failed test cases using Google, Respon-siveVoice, Festival, and Espeak TTSes, respectively. Moreover, when we run CrossASR on the entire 20,000 texts, it can generate 13,572, 13,071, 5,911, and 1,064 failed test cases using Google, ResponsiveVoice, Festival, and Espeak TTSes, respectively. Based on a manual verification carried out on statistically representative sample size, we found that most samples are actual failed test cases (audio understandable to humans but cannot be transcribed properly by an ASR), demonstrating that CrossASR is highly reliable in determining failed test cases. We also make the source code for CrossASR and evaluation data available at https://github.com/soarsμCrossASR.
format	text
author	ASYROFI, Muhammad Hilmi Ferdian, Thung LO, David JIANG, Lingxiao
author_facet	ASYROFI, Muhammad Hilmi Ferdian, Thung LO, David JIANG, Lingxiao
author_sort	ASYROFI, Muhammad Hilmi
title	CrossASR: Efficient differential testing of automatic speech recognition via text-to-speech
title_short	CrossASR: Efficient differential testing of automatic speech recognition via text-to-speech
title_full	CrossASR: Efficient differential testing of automatic speech recognition via text-to-speech
title_fullStr	CrossASR: Efficient differential testing of automatic speech recognition via text-to-speech
title_full_unstemmed	CrossASR: Efficient differential testing of automatic speech recognition via text-to-speech
title_sort	crossasr: efficient differential testing of automatic speech recognition via text-to-speech
publisher	Institutional Knowledge at Singapore Management University
publishDate	2020
url	https://ink.library.smu.edu.sg/sis_research/5536 https://ink.library.smu.edu.sg/context/sis_research/article/6539/viewcontent/icsme20crossASR.pdf
_version_	1770575503244132352

CrossASR: Efficient differential testing of automatic speech recognition via text-to-speech

Similar Items