Music analysis and similarities measure
As data becomes vastly available in the digital form, the integrity, organizing and searching for these data degrades. This is especially true for digital media data which users can generate and reproduce easily. This project aims to research on current audio analysis techniques and machine learning...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2011
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/46455 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | As data becomes vastly available in the digital form, the integrity, organizing and searching for these data degrades. This is especially true for digital media data which users can generate and reproduce easily. This project aims to research on current audio analysis techniques and machine learning algorithms to develop a Java application to help users better manage their audio media files. This is done by allowing automated classification of presented .mp3 audio files into different genres of music, and other audio similarity related functions. Experiments are also conducted to find out how these tasks can be improved to yield better accuracy. This report documents the application designed, its functionalities in addition to classification, and experiments carried out on it.
Each audio file is made up of a set of audio samples at a predefined rate in hertz. The developed application “JClassifier” first computes a set of meaningful features from a set of audio samples segmented into different analysis frames, features extracted include Mel Frequency Cepstral Coefficients, Spectral dimensional features, Linear Predictive Coding and Methods of Moments, which will form a ‘signature’ for each audio. Each feature represents the audio in specific areas such as pitch, melody, beats and timbre of the sound. Classification of the audio is then carried out using these signatures using an ensemble of commonly used classifiers which are Support Vector Machines, K-Nearest Neighbour, and Artificial Neural Network.
The system has been trained using a well labelled GTZAN dataset consisting of 1000 music pieces divided into 10 genres. Parameters like the window size, overlap ratio, and processing segment length for the audio stream and combinations of feature extracted is experimented to find out if these factors coupled with the use of ensemble classifier can improve the classification results. Experiment results show improved performance in different parameters setup and better classification results using an ensemble of classifier for the classification task. |
---|