FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES
Automatic Speech Recognition (ASR) generates recognition result as its output. The text is usually unpunctuated and not capitalized. Recognition results are often used as the input of other natural language processing tasks. Formatting recognition result would give a huge benefit for both humans and...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/47976 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:47976 |
---|---|
spelling |
id-itb.:479762020-06-25T01:05:46ZFORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES Nugroho Hadiwinoto, Patrick Indonesia Theses recognition result of ASR, transcript formatting, punctuation prediction, statistical machine translation, neural machine translation. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/47976 Automatic Speech Recognition (ASR) generates recognition result as its output. The text is usually unpunctuated and not capitalized. Recognition results are often used as the input of other natural language processing tasks. Formatting recognition result would give a huge benefit for both humans and machines. One of the most common approaches is machine translation based approach. Nowadays, machine translation itself is mainly grouped into two techniques: statistical-based and deep learning-based. This research intends to add full stops, commas, and capital letters by using both statistical machine translation (SMT) and neural machine translation (NMT) approaches. The best F-measure for SMT approach with the unit data are of single sentence, are: 22.16% for full stops, 20.69% for commas and 56.49% for capital letters. The NMT results are: 86.51% for full stops, 54.05% for commas and 91.01% for capital letters. While simulating the real recognition result of ASR which consists of sentence sequences instead of single sentences, the best results for a 100% accurate ASR with NMT approach are: 42.38% for full stops, 37.56% for commas and 83.94% for capital letters. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Automatic Speech Recognition (ASR) generates recognition result as its output. The text is usually unpunctuated and not capitalized. Recognition results are often used as the input of other natural language processing tasks. Formatting recognition result would give a huge benefit for both humans and machines. One of the most common approaches is machine translation based approach. Nowadays, machine translation itself is mainly grouped into two techniques: statistical-based and deep learning-based. This research intends to add full stops, commas, and capital letters by using both statistical machine translation (SMT) and neural machine translation (NMT) approaches. The best F-measure for SMT approach with the unit data are of single sentence, are: 22.16% for full stops, 20.69% for commas and 56.49% for capital letters. The NMT results are: 86.51% for full stops, 54.05% for commas and 91.01% for capital letters. While simulating the real recognition result of ASR which consists of sentence sequences instead of single sentences, the best results for a 100% accurate ASR with NMT approach are: 42.38% for full stops, 37.56% for commas and 83.94% for capital letters.
|
format |
Theses |
author |
Nugroho Hadiwinoto, Patrick |
spellingShingle |
Nugroho Hadiwinoto, Patrick FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES |
author_facet |
Nugroho Hadiwinoto, Patrick |
author_sort |
Nugroho Hadiwinoto, Patrick |
title |
FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES |
title_short |
FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES |
title_full |
FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES |
title_fullStr |
FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES |
title_full_unstemmed |
FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES |
title_sort |
formatting recognition results of automatic speech recognition using statistical and deep learning based approaches |
url |
https://digilib.itb.ac.id/gdl/view/47976 |
_version_ |
1822927789826768896 |