Learning based signal quality assessment for multimedia communications
Multimedia contents (including image/video, speech, audio, graphic and so on) can be affected by a wide variety of distortions during the process of acquisition, compression, processing, transmission, and reproduction which generally leads to loss of perceptual quality. As a result, signal quality a...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2012
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/50753 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Multimedia contents (including image/video, speech, audio, graphic and so on) can be affected by a wide variety of distortions during the process of acquisition, compression, processing, transmission, and reproduction which generally leads to loss of perceptual quality. As a result, signal quality assessment is an important component in today’s multimedia communication systems. In this thesis, perceptual quality assessment algorithms are proposed for three important types of multimedia signals, namely image, video, and speech. This involves two crucial stages: (a) feature extraction/detection, and (b) feature pooling.
The first stage calls for investigation and analysis into appropriate and effective signal features to extract meaningful information and provide a compact representation of the signal with the regard of quality. This is crucial because the selected features form the
basis of the resultant quality metric. In this thesis, we discuss and provide detailed analysis of features based on Singular Value Decomposition, 2D mel-cepstrum and phase of Fourier Transform for visual quality assessment. We analyse the advantages and
disadvantages of these features with regards to prediction accuracy and complexity. We also investigate into mel filter bank energies as features for evaluating quality of noisesuppressed
speech and provide justification for their effectiveness via theoretical and experimental analysis.
On the other hand, the second stage requires the determination of appropriate weights for fusing the features into a single score that can accurately reflect the human judgement of perceptual quality. We tackle this by using machine learning techniques which have
been successfully employed in numerous research areas (for example in computer vision tasks such as object localization/tracking/recognition) but have not been adequately
addressed in the literature within the realm of objective quality evaluation. Their major advantage is the introduction of a more systematic pooling methodology thereby avoiding unrealistic assumptions imposed in existing pooling methods. In this thesis, we demonstrate that machine learning can be effective in quality assessment if proper signal features are detected. We also provide insights into machine learning based feature pooling by analyzing the system trained on subjective scores which quantify human perception.
The proposed algorithms have been validated on a large number of subjectively rated databases which are publicly available. We have performed careful experimental analysis (including within database and cross database tests) and demonstrated that the proposed schemes overall perform better than several relevant methods. The better alignment with human perception confirms the effectiveness of the algorithms proposed in this thesis. |
---|