Applying machine learning to human speech for image interpretations

Speech Recognition has become prevalent over the years due to its ability to do information search, communicate and transcribe faster than typing on a keyboard. It is predicted that about half of the searches would employ Speech Recognition by 2020. With the growing trend of big data, data analytic...

Full description

Saved in:

Bibliographic Details
Main Author:	Woon, Yee Gin
Other Authors:	Ji-Jon Sit
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Engineering::Electrical and electronic engineering
Online Access:	https://hdl.handle.net/10356/149221
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-149221
record_format	dspace
spelling	sg-ntu-dr.10356-1492212023-07-07T17:54:27Z Applying machine learning to human speech for image interpretations Woon, Yee Gin Ji-Jon Sit School of Electrical and Electronic Engineering jijon@ntu.edu.sg Engineering::Electrical and electronic engineering Speech Recognition has become prevalent over the years due to its ability to do information search, communicate and transcribe faster than typing on a keyboard. It is predicted that about half of the searches would employ Speech Recognition by 2020. With the growing trend of big data, data analytic and data science in the field of Machine Learning, the accuracy and precision for the audio recognition has vastly improved. There are technologies available to support related applications such as the voice assistance in Google Assistance, Amazon Alexa, Apple Siri, and Microsoft Cortana. The open sources for the Speech to Text recognition API enabled development to wider area such as education, customer support and even to daily texting routine. SpeechArt is a product from adapting existing Speech Recognition technology, DeepSpeech to creating artistic images for the transcribed texts. The acoustic and language model of an open source of DeepSpeech would be utilised. The transcribed text generated from DeepSpeech would be parsed by an NLP Model. The key words would be selected based on user’s audio input and then send image search model to return internet images with artistic effects in real time. The purpose of the project is to build an application that converts speech to image that is useful for visual learning in education and can be extended to artistic aspects for example, portraying a new design work. Bachelor of Engineering (Electrical and Electronic Engineering) 2021-05-28T07:30:54Z 2021-05-28T07:30:54Z 2021 Final Year Project (FYP) Woon, Y. G. (2021). Applying machine learning to human speech for image interpretations. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/149221 https://hdl.handle.net/10356/149221 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering
spellingShingle	Engineering::Electrical and electronic engineering Woon, Yee Gin Applying machine learning to human speech for image interpretations
description	Speech Recognition has become prevalent over the years due to its ability to do information search, communicate and transcribe faster than typing on a keyboard. It is predicted that about half of the searches would employ Speech Recognition by 2020. With the growing trend of big data, data analytic and data science in the field of Machine Learning, the accuracy and precision for the audio recognition has vastly improved. There are technologies available to support related applications such as the voice assistance in Google Assistance, Amazon Alexa, Apple Siri, and Microsoft Cortana. The open sources for the Speech to Text recognition API enabled development to wider area such as education, customer support and even to daily texting routine. SpeechArt is a product from adapting existing Speech Recognition technology, DeepSpeech to creating artistic images for the transcribed texts. The acoustic and language model of an open source of DeepSpeech would be utilised. The transcribed text generated from DeepSpeech would be parsed by an NLP Model. The key words would be selected based on user’s audio input and then send image search model to return internet images with artistic effects in real time. The purpose of the project is to build an application that converts speech to image that is useful for visual learning in education and can be extended to artistic aspects for example, portraying a new design work.
author2	Ji-Jon Sit
author_facet	Ji-Jon Sit Woon, Yee Gin
format	Final Year Project
author	Woon, Yee Gin
author_sort	Woon, Yee Gin
title	Applying machine learning to human speech for image interpretations
title_short	Applying machine learning to human speech for image interpretations
title_full	Applying machine learning to human speech for image interpretations
title_fullStr	Applying machine learning to human speech for image interpretations
title_full_unstemmed	Applying machine learning to human speech for image interpretations
title_sort	applying machine learning to human speech for image interpretations
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/149221
_version_	1772825185525694464

Applying machine learning to human speech for image interpretations

Similar Items