Speech recognition using joint time frequency analysis

Speech is the ultimate interface. As computer telephony continues to gain mainstream appeal, new demands emerge for speech recognition solutions. Many techniques are currently available and proven effective such as the Linear Predictive Coding analysis which is the popular choice among speech featur...

Full description

Saved in:
Bibliographic Details
Main Authors: Abelgas, Minette G., Pagsibigan, Romel S., Sin, Johannes Paul S., Wu, Jue-Yu T.
Format: text
Language:English
Published: Animo Repository 2002
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_bachelors/14224
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
Description
Summary:Speech is the ultimate interface. As computer telephony continues to gain mainstream appeal, new demands emerge for speech recognition solutions. Many techniques are currently available and proven effective such as the Linear Predictive Coding analysis which is the popular choice among speech feature extraction techniques. However, new techniques have emerged such as the Joint-Time Frequency analysis which as the name implies, checks both the time and frequency elements of a signal. Gabor Transform is a feature extraction algorithm that performs the process mentioned above. Speech Recognition System Using Joint Time-Frequency Analysis (SR-JTFA) is a discrete isolated word recognition system that was designed to recognized ten words. It is for study purposes and will determine how effective Gabor Transform is as a feature extraction technique. These words are predefined in the system's library. The user utters a word, through a microphone connected to a computer, that is part of the library and the system outputs the word that it matches onto. The results will then be tabulated using a confusion matrix to show the efficiency of the system in recognizing the words. The interface was designed using Visual Basic while Turbo C++ was used in designing the speech processing modules of the system.