MODEL MARKOV TERSEMBUNYI YANG BERGANTUNG WAKTU UNTUK PENGENALAN UCAPAN BILANGAN BAHASA INDONESIA
<b>Abstract</b>: <p align=\"justify\"> <br /> One of various methods used for speech recognition system is called Hidden Markov Model (HMM). Conventional HMM assumes that speech process is a stationer signal where state transition probability is time-invariant. Thi...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/4792 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | <b>Abstract</b>: <p align=\"justify\"> <br />
One of various methods used for speech recognition system is called Hidden Markov Model (HMM). Conventional HMM assumes that speech process is a stationer signal where state transition probability is time-invariant. This research proposes Time-Dependent HMM (TDHMM), which assumes that speech process is non-stationer and the state transition probability is time-variant. TDHMM based on reinterpretation of Context-Dependent HMM applied on each frame. State transition probability on each frame depends on previous frame and location of that frame.<p align=\"justify\"> <br />
The speech recognition system works off-line and uses 12 order cepstral coefficient, derived from 10 order LPC with linear time warping and manual detection and isolation of each word. Probability distribution determined by a function of the distance between observed cepstral coefficient and vector code using a weighting function. Performance of speech recognition system is examined through recognizition of numbers in Indonesian language with combination of one word to five words. Speech data was recorded from 8 speaker consist of 4 male and 4 female.<p align=\"justify\"> <br />
Exponential weighting function produces the best result compared to the other two weighting functions, i.e inverse and normal function. Recogniton error decreases as number of frame and number of vector code increases. Single word recogniton error for single speaker is 1% (24 frames and 6 bit code) to 9% (8 frames and 3 bit code). Single word recognition error for multi speaker is 5% (24 frames and 6 bit code) to 33% (8 frames and 3 bit code). Recognition error decreases exponentially as the ratio of number of training data and the number of word to recognize increases. Using context between each words, recognition coefficient for connected words was improved. As an example, recognition coefficient for five connected words was improved from 74% to 93% for single speaker and 66% to 92% for multi speaker using 20 frames and 4 bit code. <br />
<br />
<br />
|
---|