FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES

Automatic Speech Recognition (ASR) generates recognition result as its output. The text is usually unpunctuated and not capitalized. Recognition results are often used as the input of other natural language processing tasks. Formatting recognition result would give a huge benefit for both humans and...

Full description

Saved in:
Bibliographic Details
Main Author: Nugroho Hadiwinoto, Patrick
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/47976
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Automatic Speech Recognition (ASR) generates recognition result as its output. The text is usually unpunctuated and not capitalized. Recognition results are often used as the input of other natural language processing tasks. Formatting recognition result would give a huge benefit for both humans and machines. One of the most common approaches is machine translation based approach. Nowadays, machine translation itself is mainly grouped into two techniques: statistical-based and deep learning-based. This research intends to add full stops, commas, and capital letters by using both statistical machine translation (SMT) and neural machine translation (NMT) approaches. The best F-measure for SMT approach with the unit data are of single sentence, are: 22.16% for full stops, 20.69% for commas and 56.49% for capital letters. The NMT results are: 86.51% for full stops, 54.05% for commas and 91.01% for capital letters. While simulating the real recognition result of ASR which consists of sentence sequences instead of single sentences, the best results for a 100% accurate ASR with NMT approach are: 42.38% for full stops, 37.56% for commas and 83.94% for capital letters.