The psychoacoustics and synthesis of singing harmony

The human singing voice is a remarkable instrument that compounds an immense amount of expressivity onto a single dimension. Apart from semantics and melody (pitch, duration and dynamics), accent, age, gender and emotion are all carried in the singing voice. While a single singing voice on its own i...

Full description

Saved in:
Bibliographic Details
Main Author: Chan, Paul Yaozhu
Other Authors: Chng Eng Siong
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/142516
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-142516
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Chan, Paul Yaozhu
The psychoacoustics and synthesis of singing harmony
description The human singing voice is a remarkable instrument that compounds an immense amount of expressivity onto a single dimension. Apart from semantics and melody (pitch, duration and dynamics), accent, age, gender and emotion are all carried in the singing voice. While a single singing voice on its own is aesthetically pleasing to the ear, the addition of concurrent voices of different pitch is commonly known to be capable of producing a pleasing effect far greater than the sum of that produced by each contributing voice. This motivates the use of harmony in singing. Unfortunately, accompaniment voices are difficult to sing, even for professional singers. Thankfully singing synthesis has made it viable for this task to be undertaken by machines. The overall objective of this thesis is to advance today’s understanding of singing harmony and ultimately develop novel techniques for its synthetic reproduction. This is broken down into three parts. The first focuses on a psychophysical basis of harmony, the second focuses on the synthesis of the singing voice, while the third combines the first two to focus on the synthesis of harmonized singing. The first contribution is an attempt to find a psychoacoustic basis of harmony and presented in chapter 2. Apart from stationary harmony (chords, or sonorities: the aesthetics of a group of concurrent notes at one point of time), this also includes transitional harmony (chord progression, or resolution: the aesthetics of a similar group of notes progressing to another). In order to explain both stationary and transitional harmony, it introduces a theory of harmony based on the notions of interharmonic and subharmonic modulations. Acoustic measures of stationary and transitional harmony are proposed and the answers to five fundamental questions of psychoacoustic harmony are presented, both based on this theory. Correlations with existing music theory and perception statistics support this contribution with both stationary and transitional harmony. The second contribution is in the synthesis of the singing voice and presented in chapter 3. Modern singing synthesis methods are at best capable of word- level runtime synthesis, with only two known ones dedicated to realtime synthesis. This means that they are applicable only towards offline music production. A large part of the art of music and singing, however, is in realtime performance. With both of the existing realtime singing synthesis methods bounded by a phone- coverage to realtime-capability tradeoff, a need for one that overcomes it remains. A novel realtime singing synthesis system, SERAPHIM, is proposed as an answer to this. Apart from overcoming this phone-coverage to realtime-capability trade- off, subjective listening tests also showed that listeners preferred voices synthesized by SERAPHIM as opposed to other realtime systems. The third contribution is in the synthesis of singing harmony and presented in chapter 4. With this contribution, a novel method for singing harmony synthesis is proposed. Current implementations can be classified into pitch-inaccurate rule- based systems, timing-inaccurate inference-based systems, and hybrid systems that trade off between pitch inaccuracies and timing inaccuracies. This means that existing systems are vulnerable to either pitch errors, timing errors or both in different degrees of compromise. The challenge in the task was to overcome this compromise to develop a robust technique that is simultaneously resilient to both pitch and timing errors while producing harmonious accompaniment. Our strategy was to leverage on the pitch-accurate inference-based method while eliminating timing inaccuracies by use of machine-synchronization. Spectrograms revealed that harmonized voices produced by this method contain the least dissonances amongst existing methods. Subjective listening tests also showed that harmonized voices produced by this method are perceived to be the best sounding, both by vocal experts and by casual listeners. All in all, the work presented in this thesis contributes to the advancement of the psychoacoustic understanding and machine synthesis of singing harmony across one journal paper, three conference papers and three patents.
author2 Chng Eng Siong
author_facet Chng Eng Siong
Chan, Paul Yaozhu
format Thesis-Doctor of Philosophy
author Chan, Paul Yaozhu
author_sort Chan, Paul Yaozhu
title The psychoacoustics and synthesis of singing harmony
title_short The psychoacoustics and synthesis of singing harmony
title_full The psychoacoustics and synthesis of singing harmony
title_fullStr The psychoacoustics and synthesis of singing harmony
title_full_unstemmed The psychoacoustics and synthesis of singing harmony
title_sort psychoacoustics and synthesis of singing harmony
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/142516
_version_ 1683494220468322304
spelling sg-ntu-dr.10356-1425162020-10-28T08:40:29Z The psychoacoustics and synthesis of singing harmony Chan, Paul Yaozhu Chng Eng Siong School of Computer Science and Engineering ASESChng@ntu.edu.sg Engineering::Computer science and engineering The human singing voice is a remarkable instrument that compounds an immense amount of expressivity onto a single dimension. Apart from semantics and melody (pitch, duration and dynamics), accent, age, gender and emotion are all carried in the singing voice. While a single singing voice on its own is aesthetically pleasing to the ear, the addition of concurrent voices of different pitch is commonly known to be capable of producing a pleasing effect far greater than the sum of that produced by each contributing voice. This motivates the use of harmony in singing. Unfortunately, accompaniment voices are difficult to sing, even for professional singers. Thankfully singing synthesis has made it viable for this task to be undertaken by machines. The overall objective of this thesis is to advance today’s understanding of singing harmony and ultimately develop novel techniques for its synthetic reproduction. This is broken down into three parts. The first focuses on a psychophysical basis of harmony, the second focuses on the synthesis of the singing voice, while the third combines the first two to focus on the synthesis of harmonized singing. The first contribution is an attempt to find a psychoacoustic basis of harmony and presented in chapter 2. Apart from stationary harmony (chords, or sonorities: the aesthetics of a group of concurrent notes at one point of time), this also includes transitional harmony (chord progression, or resolution: the aesthetics of a similar group of notes progressing to another). In order to explain both stationary and transitional harmony, it introduces a theory of harmony based on the notions of interharmonic and subharmonic modulations. Acoustic measures of stationary and transitional harmony are proposed and the answers to five fundamental questions of psychoacoustic harmony are presented, both based on this theory. Correlations with existing music theory and perception statistics support this contribution with both stationary and transitional harmony. The second contribution is in the synthesis of the singing voice and presented in chapter 3. Modern singing synthesis methods are at best capable of word- level runtime synthesis, with only two known ones dedicated to realtime synthesis. This means that they are applicable only towards offline music production. A large part of the art of music and singing, however, is in realtime performance. With both of the existing realtime singing synthesis methods bounded by a phone- coverage to realtime-capability tradeoff, a need for one that overcomes it remains. A novel realtime singing synthesis system, SERAPHIM, is proposed as an answer to this. Apart from overcoming this phone-coverage to realtime-capability trade- off, subjective listening tests also showed that listeners preferred voices synthesized by SERAPHIM as opposed to other realtime systems. The third contribution is in the synthesis of singing harmony and presented in chapter 4. With this contribution, a novel method for singing harmony synthesis is proposed. Current implementations can be classified into pitch-inaccurate rule- based systems, timing-inaccurate inference-based systems, and hybrid systems that trade off between pitch inaccuracies and timing inaccuracies. This means that existing systems are vulnerable to either pitch errors, timing errors or both in different degrees of compromise. The challenge in the task was to overcome this compromise to develop a robust technique that is simultaneously resilient to both pitch and timing errors while producing harmonious accompaniment. Our strategy was to leverage on the pitch-accurate inference-based method while eliminating timing inaccuracies by use of machine-synchronization. Spectrograms revealed that harmonized voices produced by this method contain the least dissonances amongst existing methods. Subjective listening tests also showed that harmonized voices produced by this method are perceived to be the best sounding, both by vocal experts and by casual listeners. All in all, the work presented in this thesis contributes to the advancement of the psychoacoustic understanding and machine synthesis of singing harmony across one journal paper, three conference papers and three patents. Doctor of Philosophy 2020-06-23T07:02:38Z 2020-06-23T07:02:38Z 2020 Thesis-Doctor of Philosophy Chan, P. Y. (2020). The psychoacoustics and synthesis of singing harmony. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/142516 10.32657/10356/142516 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University