Query-by-singing based music retrieval
Imagine a situation where you have heard a new song over the radio or in a music shop which you like it a lot and recorded it down with your mobile phone. And another situation where you remember a part of a song but forgotten the title. In both situations, neither the song title nor artist...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2010
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/38814 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Imagine a situation where you have heard a new song over the radio or in a music
shop which you like it a lot and recorded it down with your mobile phone. And
another situation where you remember a part of a song but forgotten the title. In
both situations, neither the song title nor artiste name is known, but you would
like to retrieve the song. However, the most common method for retrieving a song
currently requires the song title or artiste name as the input information.
Hence, a query-by-singing based music retrieval system is proposed as a solution
for the scenarios above. The system is implemented using MATLAB and is built
on a music database of 500 songs. Experiments are conducted to test the accuracy
and efficiency of the completed system.
The process for implementing the music retrieval system involves segmenting the
songs into sentences based on time-coded lyrics. Each sentence is further divided
into fixed overlapping frames to perform feature extraction. The Mel-Frequency
Cepstral Coefficients (MFCCs) are extracted as the feature from the music
database and clustered using K-means algorithm. The resulting data from the
clustering will then be processed with Bag-of-Words model to form a histogram
for each sentence of a song. The Random Projection Tree (RP-Tree) is used for
indexing the histograms for efficient retrieval.
Two types of queries are collected for experiments and comprise of recorded
playback and a user singing a sentence of a song. The process of audio
segmentation, feature extraction and data training is also applied to each query.
The system compares a query with the music database to retrieve results that are
closest to the query. The results are returned based on the comparison of Chisquare
distance between the query input and database.
The analysis of the results show that the system level of accuracy for the recorded
type of query is reasonably high but it did not perform as well for the singing type
of query. The efficiency of the system was greatly improved with the
implementation of RP-Tree. |
---|