Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array

A large population of people is suffering from hearing loss and inconveniences caused due to lack of auditory sense. There are different types of hearing aids available in the market but due to the discomfort associated with prolong use and the stigma of being recognized as a handicapped person, the...

Full description

Saved in:

Bibliographic Details
Main Author:	Wu, Haoran
Other Authors:	Cham Tat Jen
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2020
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/145083
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-145083
record_format	dspace
spelling	sg-ntu-dr.10356-1450832020-12-10T05:37:27Z Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array Wu, Haoran Cham Tat Jen School of Computer Science and Engineering Multimedia and Interacting Computing Lab ASTJCham@ntu.edu.sg Engineering::Computer science and engineering A large population of people is suffering from hearing loss and inconveniences caused due to lack of auditory sense. There are different types of hearing aids available in the market but due to the discomfort associated with prolong use and the stigma of being recognized as a handicapped person, they are not used often or are abandoned after several years. Therefore, this project aims to explore a way of using mixed reality and sound localization to highlight the existing sound sources in a live video stream to help hearing-impairment identify the sound. The speech-to-text conversion will be implemented to help the conversation. In this project, two approaches were explored to map the sound sources on the video stream, including using mathematical formulas and machine learning. For mathematical formula approach, to find the best-fit equation, pseudo-inverse and dot product were used to find the relationship between sound source coordinates and image coordinates. The best-fit equation was then used to map the sound source to the video stream. Machine learning approach was able to achieve better mapping accuracy in a wider range distance comparing to mathematical approach, however due to the prediction speed, it couldn’t be used in this project. Overall sound tracking and speech-to-text conversion was successfully achieved to a certain extend in this project. The future improvement can be using a smartphone camera as a platform to achieve better mobility and convenience. Bachelor of Engineering (Computer Science) 2020-12-10T05:37:27Z 2020-12-10T05:37:27Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/145083 en SCSE19-0596 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Wu, Haoran Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
description	A large population of people is suffering from hearing loss and inconveniences caused due to lack of auditory sense. There are different types of hearing aids available in the market but due to the discomfort associated with prolong use and the stigma of being recognized as a handicapped person, they are not used often or are abandoned after several years. Therefore, this project aims to explore a way of using mixed reality and sound localization to highlight the existing sound sources in a live video stream to help hearing-impairment identify the sound. The speech-to-text conversion will be implemented to help the conversation. In this project, two approaches were explored to map the sound sources on the video stream, including using mathematical formulas and machine learning. For mathematical formula approach, to find the best-fit equation, pseudo-inverse and dot product were used to find the relationship between sound source coordinates and image coordinates. The best-fit equation was then used to map the sound source to the video stream. Machine learning approach was able to achieve better mapping accuracy in a wider range distance comparing to mathematical approach, however due to the prediction speed, it couldn’t be used in this project. Overall sound tracking and speech-to-text conversion was successfully achieved to a certain extend in this project. The future improvement can be using a smartphone camera as a platform to achieve better mobility and convenience.
author2	Cham Tat Jen
author_facet	Cham Tat Jen Wu, Haoran
format	Final Year Project
author	Wu, Haoran
author_sort	Wu, Haoran
title	Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
title_short	Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
title_full	Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
title_fullStr	Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
title_full_unstemmed	Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
title_sort	seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
publisher	Nanyang Technological University
publishDate	2020
url	https://hdl.handle.net/10356/145083
_version_	1688665272604426240

Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array

Similar Items