FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES

Automatic Speech Recognition (ASR) generates recognition result as its output. The text is usually unpunctuated and not capitalized. Recognition results are often used as the input of other natural language processing tasks. Formatting recognition result would give a huge benefit for both humans and...

Full description

Saved in:
Bibliographic Details
Main Author: Nugroho Hadiwinoto, Patrick
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/47976
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:47976
spelling id-itb.:479762020-06-25T01:05:46ZFORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES Nugroho Hadiwinoto, Patrick Indonesia Theses recognition result of ASR, transcript formatting, punctuation prediction, statistical machine translation, neural machine translation. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/47976 Automatic Speech Recognition (ASR) generates recognition result as its output. The text is usually unpunctuated and not capitalized. Recognition results are often used as the input of other natural language processing tasks. Formatting recognition result would give a huge benefit for both humans and machines. One of the most common approaches is machine translation based approach. Nowadays, machine translation itself is mainly grouped into two techniques: statistical-based and deep learning-based. This research intends to add full stops, commas, and capital letters by using both statistical machine translation (SMT) and neural machine translation (NMT) approaches. The best F-measure for SMT approach with the unit data are of single sentence, are: 22.16% for full stops, 20.69% for commas and 56.49% for capital letters. The NMT results are: 86.51% for full stops, 54.05% for commas and 91.01% for capital letters. While simulating the real recognition result of ASR which consists of sentence sequences instead of single sentences, the best results for a 100% accurate ASR with NMT approach are: 42.38% for full stops, 37.56% for commas and 83.94% for capital letters. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Automatic Speech Recognition (ASR) generates recognition result as its output. The text is usually unpunctuated and not capitalized. Recognition results are often used as the input of other natural language processing tasks. Formatting recognition result would give a huge benefit for both humans and machines. One of the most common approaches is machine translation based approach. Nowadays, machine translation itself is mainly grouped into two techniques: statistical-based and deep learning-based. This research intends to add full stops, commas, and capital letters by using both statistical machine translation (SMT) and neural machine translation (NMT) approaches. The best F-measure for SMT approach with the unit data are of single sentence, are: 22.16% for full stops, 20.69% for commas and 56.49% for capital letters. The NMT results are: 86.51% for full stops, 54.05% for commas and 91.01% for capital letters. While simulating the real recognition result of ASR which consists of sentence sequences instead of single sentences, the best results for a 100% accurate ASR with NMT approach are: 42.38% for full stops, 37.56% for commas and 83.94% for capital letters.
format Theses
author Nugroho Hadiwinoto, Patrick
spellingShingle Nugroho Hadiwinoto, Patrick
FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES
author_facet Nugroho Hadiwinoto, Patrick
author_sort Nugroho Hadiwinoto, Patrick
title FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES
title_short FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES
title_full FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES
title_fullStr FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES
title_full_unstemmed FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES
title_sort formatting recognition results of automatic speech recognition using statistical and deep learning based approaches
url https://digilib.itb.ac.id/gdl/view/47976
_version_ 1822927789826768896