Filipino text-to-speech system: Tagapagsalita

Although computers can be used to speak like humans, it is more likely to sound artificial or synthetic. Such a task is normally performed by a Text-to-Speech (TTS) system. Few studies have been conducted to implement TTS systems in Tagalog. In this research a TTS system specifically designed for th...

Full description

Saved in:
Bibliographic Details
Main Authors: Aralar, Kevin Romualdo A., Coloso, Paolo Miguel H., Moneda, Jerlyn R.
Format: text
Language:English
Published: Animo Repository 2006
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_bachelors/7656
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
Description
Summary:Although computers can be used to speak like humans, it is more likely to sound artificial or synthetic. Such a task is normally performed by a Text-to-Speech (TTS) system. Few studies have been conducted to implement TTS systems in Tagalog. In this research a TTS system specifically designed for the Tagalog number words Isa to Isandaan was developed. This TTS system works in three major stages. Diphones present in the words Isa to Isandaan were first recorded, cut and denoised using a third party program specialising in audio processing. The pre-processed signals were compressed using Linear Predictive Coding the signals were passed to a reversible filter which extracts LPC Coeffecients, per frame gains and excitation. Finally, these parameters were taken and reversed to produced a synthethic version of the original diphones. Through the use of the Synchronous Overlap-Add (SOLA) technique, reconstructed diphones were concatenated into whole words. Based on its purpose, testing of the system was rated by intelligibility. Thirty-one persons were requested to articulation and speed with the score of 1 being the lowest and 5 being the highest score. Mean opinion score of 30 persons scored an average of 4.30 for listening effort, 4.27 for syllabication, 4.16 for stress, 4.18 for articulation, 4.07 for speed in all significant words for male and 4.25 for listening effort, 4.29 for syllabication, 4.16 for stress, 4.18 for articulation and 4.14 for speed in all significant words for female. Discrepancies of the speech intelligibility and quality are much attributed to the preprocessing phase of the speech signal and also to the subjective perception of the respondent listener based upon the prosodic parameters like pitch, duration and amplitude as seen from the result of the MOS of the synthetic uttered tagalog word Isandaan . Linear Predictive Coding technique is a useful tool for compression, since it can extract information for the synthesis of speech without affecting the intelligibility of the speech.