Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array

A large population of people is suffering from hearing loss and inconveniences caused due to lack of auditory sense. There are different types of hearing aids available in the market but due to the discomfort associated with prolong use and the stigma of being recognized as a handicapped person, the...

Full description

Saved in:
Bibliographic Details
Main Author: Wu, Haoran
Other Authors: Cham Tat Jen
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/145083
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-145083
record_format dspace
spelling sg-ntu-dr.10356-1450832020-12-10T05:37:27Z Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array Wu, Haoran Cham Tat Jen School of Computer Science and Engineering Multimedia and Interacting Computing Lab ASTJCham@ntu.edu.sg Engineering::Computer science and engineering A large population of people is suffering from hearing loss and inconveniences caused due to lack of auditory sense. There are different types of hearing aids available in the market but due to the discomfort associated with prolong use and the stigma of being recognized as a handicapped person, they are not used often or are abandoned after several years. Therefore, this project aims to explore a way of using mixed reality and sound localization to highlight the existing sound sources in a live video stream to help hearing-impairment identify the sound. The speech-to-text conversion will be implemented to help the conversation. In this project, two approaches were explored to map the sound sources on the video stream, including using mathematical formulas and machine learning. For mathematical formula approach, to find the best-fit equation, pseudo-inverse and dot product were used to find the relationship between sound source coordinates and image coordinates. The best-fit equation was then used to map the sound source to the video stream. Machine learning approach was able to achieve better mapping accuracy in a wider range distance comparing to mathematical approach, however due to the prediction speed, it couldn’t be used in this project. Overall sound tracking and speech-to-text conversion was successfully achieved to a certain extend in this project. The future improvement can be using a smartphone camera as a platform to achieve better mobility and convenience. Bachelor of Engineering (Computer Science) 2020-12-10T05:37:27Z 2020-12-10T05:37:27Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/145083 en SCSE19-0596 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Wu, Haoran
Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
description A large population of people is suffering from hearing loss and inconveniences caused due to lack of auditory sense. There are different types of hearing aids available in the market but due to the discomfort associated with prolong use and the stigma of being recognized as a handicapped person, they are not used often or are abandoned after several years. Therefore, this project aims to explore a way of using mixed reality and sound localization to highlight the existing sound sources in a live video stream to help hearing-impairment identify the sound. The speech-to-text conversion will be implemented to help the conversation. In this project, two approaches were explored to map the sound sources on the video stream, including using mathematical formulas and machine learning. For mathematical formula approach, to find the best-fit equation, pseudo-inverse and dot product were used to find the relationship between sound source coordinates and image coordinates. The best-fit equation was then used to map the sound source to the video stream. Machine learning approach was able to achieve better mapping accuracy in a wider range distance comparing to mathematical approach, however due to the prediction speed, it couldn’t be used in this project. Overall sound tracking and speech-to-text conversion was successfully achieved to a certain extend in this project. The future improvement can be using a smartphone camera as a platform to achieve better mobility and convenience.
author2 Cham Tat Jen
author_facet Cham Tat Jen
Wu, Haoran
format Final Year Project
author Wu, Haoran
author_sort Wu, Haoran
title Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
title_short Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
title_full Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
title_fullStr Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
title_full_unstemmed Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
title_sort seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/145083
_version_ 1688665272604426240