Non-reference speech quality assessment based on deep learning

In the field of speech processing, voice quality evaluation is one of the important techniques, and it has been widely used in mobile communications, Internet, public safety, digital entertainment, consumer electronics, and other fields. In the early days, there was only subjective voice quality ass...

Full description

Saved in:

Bibliographic Details
Main Author:	Fang, Xuhui
Other Authors:	Tan Yap Peng
Format:	Thesis-Master by Coursework
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Electrical and electronic engineering
Online Access:	https://hdl.handle.net/10356/164956
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-164956
record_format	dspace
spelling	sg-ntu-dr.10356-1649562023-07-04T16:08:07Z Non-reference speech quality assessment based on deep learning Fang, Xuhui Tan Yap Peng School of Electrical and Electronic Engineering EYPTan@ntu.edu.sg Engineering::Electrical and electronic engineering In the field of speech processing, voice quality evaluation is one of the important techniques, and it has been widely used in mobile communications, Internet, public safety, digital entertainment, consumer electronics, and other fields. In the early days, there was only subjective voice quality assessment, but it required large human resources, annotated data and time. Hence, objective voice quality evaluation methods gradually became popular. Referenced speech quality assessment models require pure and raw speech signals, which are sometimes difficult to obtain in practice. As a result, the reference speech quality assessment method has received increased attention, especially in recent years. Many experts and researchers have integrated deep learning technology into reference speech quality assessment, which has made a major breakthrough in this field. However, the existing deep learning-based speech quality evaluation still has limitations such as insufficient accuracy and large number of parameters. In order to address these limitations, this dissertation studies the non-reference speech quality evaluation method based on deep learning, and the main research is summarized below: (1) Considering the problem that the accuracy of existing voice quality assessment is not high enough, this dissertation proposes an improvement method from multiple perspectives. This includes the use of BiLSTM(Bidirectional Long Short-Term Memory) to improve the time-dependent model, fully exploiting the ability of BiLSTM to effectively learn the speech context information. On this basis, the Squeeze-and-Excitation (SE) module is added to screen out the attention of the channels by learning the correlation between different channels in the feature map, so as to perform feature calibration on the feature map. In addition, a custom loss function based on the signal loss ratio is used to improve model fitting, which further improves the evaluation performance of the model. Experimental results show the effectiveness of this method. (2) For the problem that the existing speech quality evaluation model has a large number of parameters, we propose a low-complexity speech quality evaluation method based on depthwise residual convolution and Bidirectional Gate Recurrent Unit (BiGRU), the SE-DSResBGRU-NRSQA model\cite{CNN41}. The main goal of this model is to reduce the number of parameters, by using BiGRU and depthwise separable convolution, optimizing the convolution part with the main structure of residual network (ResNet), and using shallow feature information to improve the evaluation performance through direct mapping. On this basis, SE modules are added to learn the importance of different channels, so as to effectively exploit the input information and improve the evaluation performance of the system. From the experimental results, it can be seen that the proposed method can achieve good speech quality evaluation while the number of parameters is relatively small. Master of Science (Communications Engineering) 2023-03-03T02:07:06Z 2023-03-03T02:07:06Z 2023 Thesis-Master by Coursework Fang, X. (2023). Non-reference speech quality assessment based on deep learning. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164956 https://hdl.handle.net/10356/164956 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering
spellingShingle	Engineering::Electrical and electronic engineering Fang, Xuhui Non-reference speech quality assessment based on deep learning
description	In the field of speech processing, voice quality evaluation is one of the important techniques, and it has been widely used in mobile communications, Internet, public safety, digital entertainment, consumer electronics, and other fields. In the early days, there was only subjective voice quality assessment, but it required large human resources, annotated data and time. Hence, objective voice quality evaluation methods gradually became popular. Referenced speech quality assessment models require pure and raw speech signals, which are sometimes difficult to obtain in practice. As a result, the reference speech quality assessment method has received increased attention, especially in recent years. Many experts and researchers have integrated deep learning technology into reference speech quality assessment, which has made a major breakthrough in this field. However, the existing deep learning-based speech quality evaluation still has limitations such as insufficient accuracy and large number of parameters. In order to address these limitations, this dissertation studies the non-reference speech quality evaluation method based on deep learning, and the main research is summarized below: (1) Considering the problem that the accuracy of existing voice quality assessment is not high enough, this dissertation proposes an improvement method from multiple perspectives. This includes the use of BiLSTM(Bidirectional Long Short-Term Memory) to improve the time-dependent model, fully exploiting the ability of BiLSTM to effectively learn the speech context information. On this basis, the Squeeze-and-Excitation (SE) module is added to screen out the attention of the channels by learning the correlation between different channels in the feature map, so as to perform feature calibration on the feature map. In addition, a custom loss function based on the signal loss ratio is used to improve model fitting, which further improves the evaluation performance of the model. Experimental results show the effectiveness of this method. (2) For the problem that the existing speech quality evaluation model has a large number of parameters, we propose a low-complexity speech quality evaluation method based on depthwise residual convolution and Bidirectional Gate Recurrent Unit (BiGRU), the SE-DSResBGRU-NRSQA model\cite{CNN41}. The main goal of this model is to reduce the number of parameters, by using BiGRU and depthwise separable convolution, optimizing the convolution part with the main structure of residual network (ResNet), and using shallow feature information to improve the evaluation performance through direct mapping. On this basis, SE modules are added to learn the importance of different channels, so as to effectively exploit the input information and improve the evaluation performance of the system. From the experimental results, it can be seen that the proposed method can achieve good speech quality evaluation while the number of parameters is relatively small.
author2	Tan Yap Peng
author_facet	Tan Yap Peng Fang, Xuhui
format	Thesis-Master by Coursework
author	Fang, Xuhui
author_sort	Fang, Xuhui
title	Non-reference speech quality assessment based on deep learning
title_short	Non-reference speech quality assessment based on deep learning
title_full	Non-reference speech quality assessment based on deep learning
title_fullStr	Non-reference speech quality assessment based on deep learning
title_full_unstemmed	Non-reference speech quality assessment based on deep learning
title_sort	non-reference speech quality assessment based on deep learning
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/164956
_version_	1772827189571485696

Non-reference speech quality assessment based on deep learning

Similar Items