Combined articulatory and auditory processing for improved speech recognition

In this paper, we examined the feasibility of articulatory phonetic inversion (API) conditioned on the auditory qualities for improved speech recognition. And we introduced an efficient data-driven heuristic learning algorithm to capture the articulatory-phonetic features (APFs) of English speech. T...

Full description

Saved in:

Bibliographic Details
Main Authors:	Huang, Guangpu, Er, Meng Joo
Other Authors:	School of Electrical and Electronic Engineering
Format:	Conference or Workshop Item
Language:	English
Published:	2013
Subjects:	DRNTU::Engineering::Electrical and electronic engineering
Online Access:	https://hdl.handle.net/10356/98873 http://hdl.handle.net/10220/12782
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Description
Summary:	In this paper, we examined the feasibility of articulatory phonetic inversion (API) conditioned on the auditory qualities for improved speech recognition. And we introduced an efficient data-driven heuristic learning algorithm to capture the articulatory-phonetic features (APFs) of English speech. Then we reported the performance of the combined auditory and articulatory processing methods in the inversion and recognition experiments. Firstly, at the front end, the auditory based bark-frequency cepstral coefficient (BFCC) obtained equivalent or higher accuracy compared to the mel-frequency cepstral coefficient (MFCC). Secondly, the use of APFs also significantly altered the phoneme error patterns compared to the purely acoustic features, and they displayed advantages over the canonical pseudo-articulatory features (PAFs) which are manually derived from the phonological rules. The observations support our view that the combinational use of auditory and articulatory cues is beneficial for speech pattern classification. And the proposed neural based API model qualifies as a competitive candidate for profound phoneme recognition with salient features such as generality and portability.

Combined articulatory and auditory processing for improved speech recognition

Similar Items