Speech to text converter for Filipino language using hybrid artificial neural network/Hidden Markov Model

The Filipino language is a simple yet at the same time a complex language with its semantics and grammar syntax relatively easy to learn for a person. However for a machine or computer to learn this kind of capacity for language recognition require a moderately complex system. The basis for this the...

Full description

Saved in:
Bibliographic Details
Main Authors: Chan, Aylmer Jason L., Hatulan, Roger John F., Hilario, Apolonio D., Jr., Lim, Johann Kenneth T.
Format: text
Language:English
Published: Animo Repository 2007
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_bachelors/6016
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
Description
Summary:The Filipino language is a simple yet at the same time a complex language with its semantics and grammar syntax relatively easy to learn for a person. However for a machine or computer to learn this kind of capacity for language recognition require a moderately complex system. The basis for this thesis project stems from the need of convenience for the handicapped people interacting with computer and machines alike. The rapid change in the development and evolution of speech recognition systems make this endeavor a significant step for the Filipino language industry. The thesis aims to make a speech recognition system which utilizes speech processing techniques to evaluate certain words spoken in Filipino. The group first employs feature extraction as the front end process of the speech recognition system then experiments with different algorithm techniques to for optimization by using either a feed-forward back propagation algorithm or SOM networks to train the samples for the neural networks. The samples obtained from the UP Speech Corpus are to be segmented by phonemes. For the actual system, input way files undergo the speech process module for the translation of the inputs into frames and are fed to the word segmentation module which then goes to a feature extraction module. The feature extraction module computes for feature vectors which would serve as inputs to the neural network. After training the networks to a specified target, their outputs would then be used as inputs to the probabilistic Hidden Markov Model [HMM] which would then predict the most possible sequence of outputs, in this case, the phoneme sequence. A decoder would then translate the phoneme sequence into a sequence of letters that form the word used in the lookup table to search for the best likely match of the recognized word.