Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
A large population of people is suffering from hearing loss and inconveniences caused due to lack of auditory sense. There are different types of hearing aids available in the market but due to the discomfort associated with prolong use and the stigma of being recognized as a handicapped person, the...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/145083 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-145083 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1450832020-12-10T05:37:27Z Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array Wu, Haoran Cham Tat Jen School of Computer Science and Engineering Multimedia and Interacting Computing Lab ASTJCham@ntu.edu.sg Engineering::Computer science and engineering A large population of people is suffering from hearing loss and inconveniences caused due to lack of auditory sense. There are different types of hearing aids available in the market but due to the discomfort associated with prolong use and the stigma of being recognized as a handicapped person, they are not used often or are abandoned after several years. Therefore, this project aims to explore a way of using mixed reality and sound localization to highlight the existing sound sources in a live video stream to help hearing-impairment identify the sound. The speech-to-text conversion will be implemented to help the conversation. In this project, two approaches were explored to map the sound sources on the video stream, including using mathematical formulas and machine learning. For mathematical formula approach, to find the best-fit equation, pseudo-inverse and dot product were used to find the relationship between sound source coordinates and image coordinates. The best-fit equation was then used to map the sound source to the video stream. Machine learning approach was able to achieve better mapping accuracy in a wider range distance comparing to mathematical approach, however due to the prediction speed, it couldn’t be used in this project. Overall sound tracking and speech-to-text conversion was successfully achieved to a certain extend in this project. The future improvement can be using a smartphone camera as a platform to achieve better mobility and convenience. Bachelor of Engineering (Computer Science) 2020-12-10T05:37:27Z 2020-12-10T05:37:27Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/145083 en SCSE19-0596 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Wu, Haoran Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array |
description |
A large population of people is suffering from hearing loss and inconveniences caused due to lack of auditory sense. There are different types of hearing aids available in the market but due to the discomfort associated with prolong use and the stigma of being recognized as a handicapped person, they are not used often or are abandoned after several years. Therefore, this project aims to explore a way of using mixed reality and sound localization to highlight the existing sound sources in a live video stream to help hearing-impairment identify the sound. The speech-to-text conversion will be implemented to help the conversation. In this project, two approaches were explored to map the sound sources on the video stream, including using mathematical formulas and machine learning. For mathematical formula approach, to find the best-fit equation, pseudo-inverse and dot product were used to find the relationship between sound source coordinates and image coordinates. The best-fit equation was then used to map the sound source to the video stream. Machine learning approach was able to achieve better mapping accuracy in a wider range distance comparing to mathematical approach, however due to the prediction speed, it couldn’t be used in this project. Overall sound tracking and speech-to-text conversion was successfully achieved to a certain extend in this project. The future improvement can be using a smartphone camera as a platform to achieve better mobility and convenience. |
author2 |
Cham Tat Jen |
author_facet |
Cham Tat Jen Wu, Haoran |
format |
Final Year Project |
author |
Wu, Haoran |
author_sort |
Wu, Haoran |
title |
Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array |
title_short |
Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array |
title_full |
Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array |
title_fullStr |
Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array |
title_full_unstemmed |
Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array |
title_sort |
seeing sounds : sound visualization and speech to text conversion using laptop and microphone array |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/145083 |
_version_ |
1688665272604426240 |