Small-scale model for classification of single acoustic events

AI plays an essential role in enabling the awareness of intelligent machines and has recently attracted considerable attention. One such application of AI is audio classification. The most popular types of audio classifications are speech recognition and music classification. Both enjoyed great succ...

Full description

Saved in:
Bibliographic Details
Main Author: Ng, Matthew Tiong Ming
Other Authors: Gan Woon Seng
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/157540
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:AI plays an essential role in enabling the awareness of intelligent machines and has recently attracted considerable attention. One such application of AI is audio classification. The most popular types of audio classifications are speech recognition and music classification. Both enjoyed great success and still have several widely used and reliable real-life applications such as Apple’s virtual assistant Siri for the former, and Shazam for the latter. UST, on the other hand, is not as well-developed and popular when compared to the audio classification types mentioned earlier. UST can benefit AI-powered smart devices such as iPhones and robots by allowing them to better understand their environment better through the classification of sound scenes and recommend actions to users accordingly [1]. For example, this shows the potential of UST as it can improve the context understanding of AI models and may solve such issues. UST is a trained machine model that can differentiate and classify different types of acoustic events. Acoustic events of daily urban life such as wedding parties, vehicle noise, keyboard typing, etc. can give relevant cues about the human presence and activity in a scenario. With such acoustic information, UST can make connections and allow analysts to find actionable insights from the generated audio descriptions. Using DCASE 2019 Task 5 as a benchmark, the project looks into improving the baseline model both in terms of performance and reducing the model size.