A speech enhancement framework using discrete Krawtchouk-Tchebichef Transform
Speech is considered the key mode of interaction amongst humans. Speech signals encounter different scenarios during transmission, such as interference and additive noise, which lead to generate noisy signals. Therefore, robust Speech Enhancement Algorithms (SEA) that suppress noise without di...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://psasir.upm.edu.my/id/eprint/75679/1/FK%202018%20137%20-%20IR.pdf http://psasir.upm.edu.my/id/eprint/75679/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Putra Malaysia |
Language: | English |
Summary: | Speech is considered the key mode of interaction amongst humans. Speech signals
encounter different scenarios during transmission, such as interference and additive
noise, which lead to generate noisy signals. Therefore, robust Speech Enhancement
Algorithms (SEA) that suppress noise without distorting the original signals are
necessary. The removing of noise without causing speech distortion is a challenging
task. Moreover, an annoying noise that appears after the enhancement process, called
Musical Noise (MN), should be eliminated. Recent SEA approaches tend to enhance
speech quality and intelligibility, because improving these two attributes is critical for
normal people and hearing impairments. Therefore, this thesis aims to restore speech
signals from corrupted signal with minimum MN and best trade-off between Residual
Noise (RN) and Signal Distortion (SD). First, a new transform based on new
orthogonal polynomials, called the Discrete Krawtchouk–Tchebichef Transform
(DKTT), is presented. DKTT exhibits superior compaction and localization properties
that affect noise extraction process. Second, a noise classification method is adopted to
identify the types of additive noise. Then, three optimum types of parameters are
determined based on noise type. The subsequent phase of the developed system
involves the proposed non-linear speech estimator. It is based on the Minimum Mean
Square Error (MMSE) and the low-distortion approaches. The analytical solution is
derived from the assumption that speech and noise components can be modeled based
on a combination between Gamma and Laplacian distributions. These types of
combination are used first in the developed SEA. Afterward, the second proposed
linear estimator has been proposed mainly to reduce the effects of MN. Finally, the
inverse of DKTT is applied to regain the clean signal back. To demonstrate the
capability of the proposed system, clean speech sentences are selected from the TIMIT
dataset. Moreover, eleven types of noise are chosen from the NOISEX-92 dataset, in
addition to speech-shaped noises. These noises are the most dominate in the real world.
Comparison results reinforce the improvement in quality and intelligibility
measurements with reducing of MN level. The objective measurements are including
Perceptual Evaluation of Speech Quality (PESQ), Frequency-Weighted Segmental Signal-to-Noise Ratio (FWSNR), the Coherence Speech Intelligibility Index (CSII),
Short-Time Objective Intelligibility measure (STOI), along with three types of
composite measures, namely, Signal distortion (SIG), Back-ground intrusiveness
(BAK), and Overall quality (OVL). The improved SEA demonstrated an improvement
in nearly all the aforementioned quality and intelligibility measures for different types
of noise and five levels of signal-to-noise ratio (SNR), i.e., −10, −5, 0, 5, and 10 dB. In
white noise, for example, the average absolute improvements and their corresponding
percentage values of the system performance in terms of PESQ, OVL, STOI, and
FWSNR in (dB) for the five SNR levels are 0.37 (17.3%), 0.37 (24.7%), 0.59 (7.8%),
and 0.06 (7.7%), respectively. For cockpit noise, the improvements are 0.22 (10.6%),
0.18 (10.5%), 1.5 (23.3%), and 0.07 (9.5%), respectively. For Speech-Shaped noise,
the improvements are 0.23 (11.3%), 0.17 (9.1%), 2.05 (31.6%), and 0.05 (7.8%),
respectively. Moreover, the classification accuracy has been reached to 99.44%. This
work contributed in developing a new transform, finding a new speech and noise
models, introducing new linear and non-linear estimators with their adaptively
smoothing parameter to get good noise reduction. As a conclusion, the proposed SEA
enhances and improves noisy signals and regain clean signals with less RN and SD,
reducing MN level. Moreover, best improvement in quality and intelligibility
properties is obtained particularly in high noise levels. |
---|