FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES

Automatic Speech Recognition (ASR) generates recognition result as its output. The text is usually unpunctuated and not capitalized. Recognition results are often used as the input of other natural language processing tasks. Formatting recognition result would give a huge benefit for both humans and...

Full description

Saved in:

Bibliographic Details
Main Author:	Nugroho Hadiwinoto, Patrick
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/47976
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:47976
spelling	id-itb.:479762020-06-25T01:05:46ZFORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES Nugroho Hadiwinoto, Patrick Indonesia Theses recognition result of ASR, transcript formatting, punctuation prediction, statistical machine translation, neural machine translation. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/47976 Automatic Speech Recognition (ASR) generates recognition result as its output. The text is usually unpunctuated and not capitalized. Recognition results are often used as the input of other natural language processing tasks. Formatting recognition result would give a huge benefit for both humans and machines. One of the most common approaches is machine translation based approach. Nowadays, machine translation itself is mainly grouped into two techniques: statistical-based and deep learning-based. This research intends to add full stops, commas, and capital letters by using both statistical machine translation (SMT) and neural machine translation (NMT) approaches. The best F-measure for SMT approach with the unit data are of single sentence, are: 22.16% for full stops, 20.69% for commas and 56.49% for capital letters. The NMT results are: 86.51% for full stops, 54.05% for commas and 91.01% for capital letters. While simulating the real recognition result of ASR which consists of sentence sequences instead of single sentences, the best results for a 100% accurate ASR with NMT approach are: 42.38% for full stops, 37.56% for commas and 83.94% for capital letters. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Automatic Speech Recognition (ASR) generates recognition result as its output. The text is usually unpunctuated and not capitalized. Recognition results are often used as the input of other natural language processing tasks. Formatting recognition result would give a huge benefit for both humans and machines. One of the most common approaches is machine translation based approach. Nowadays, machine translation itself is mainly grouped into two techniques: statistical-based and deep learning-based. This research intends to add full stops, commas, and capital letters by using both statistical machine translation (SMT) and neural machine translation (NMT) approaches. The best F-measure for SMT approach with the unit data are of single sentence, are: 22.16% for full stops, 20.69% for commas and 56.49% for capital letters. The NMT results are: 86.51% for full stops, 54.05% for commas and 91.01% for capital letters. While simulating the real recognition result of ASR which consists of sentence sequences instead of single sentences, the best results for a 100% accurate ASR with NMT approach are: 42.38% for full stops, 37.56% for commas and 83.94% for capital letters.
format	Theses
author	Nugroho Hadiwinoto, Patrick
spellingShingle	Nugroho Hadiwinoto, Patrick FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES
author_facet	Nugroho Hadiwinoto, Patrick
author_sort	Nugroho Hadiwinoto, Patrick
title	FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES
title_short	FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES
title_full	FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES
title_fullStr	FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES
title_full_unstemmed	FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES
title_sort	formatting recognition results of automatic speech recognition using statistical and deep learning based approaches
url	https://digilib.itb.ac.id/gdl/view/47976
_version_	1822927789826768896

FORMATTING RECOGNITION RESULTS OF AUTOMATIC SPEECH RECOGNITION USING STATISTICAL AND DEEP LEARNING BASED APPROACHES

Similar Items