Automatic speech transcription from DVD

Speech Transcription is a process of converting the speech into text. That is mapping a spoken language onto written symbols. Spoken language is a continuous phenomenon, made up of a potentially unlimited number of components. This is an application of Speech Recognition. Speech Recognition (SR) is...

Full description

Saved in:
Bibliographic Details
Main Author: Ranjit Monisha Deva Belley
Other Authors: Soon Ing Yann
Format: Theses and Dissertations
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/55245
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Speech Transcription is a process of converting the speech into text. That is mapping a spoken language onto written symbols. Spoken language is a continuous phenomenon, made up of a potentially unlimited number of components. This is an application of Speech Recognition. Speech Recognition (SR) is a process to translate the speech into text format. The above process, when done automatically it is called as Automatic Speech Recognition (ASR). ASR is the challenging problems of modem man-kind. The speech recognizer should be trained with the transcribed data. Doing this process manually is expensive. Hence it is tried to be done automatically. In this project, the subtitles and its respective time information are extracted from DVD. Then the speech from its particular time information for its respective subtitle is taken from the audio information. There may be mismatch for the speech and the subtitle with the time information. Here the time domain methods are used to overcome this problem. That is done by taking energy for each time information. The proposed project does the extraction of the subtitles and the speech information for each time information from DVD automatically. The extraction of the subtitles can be performed by converting the graphical information into text information. The corresponding speech signal must also be segmented using the timing information. The information will then be converted into a format that is suitable for training a speech recognizer. This process is done automatically in MATLAB.