Semantics extraction from multimedia data

Driven by the significant advancements in Computer vision technology and relevant applications such as Unmanned Aerial Vehicle (UAV) and robotics, scene classification has become a hot and challenging problem in nowadays. Scene classification means categorizing images into different categories accor...

Full description

Saved in:

Bibliographic Details
Main Author:	Wang, Xiao Meng
Other Authors:	Mao Kezhi
Format:	Final Year Project
Language:	English
Published:	2016
Subjects:	DRNTU::Engineering
Online Access:	http://hdl.handle.net/10356/67931
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-67931
record_format	dspace
spelling	sg-ntu-dr.10356-679312023-07-07T17:19:05Z Semantics extraction from multimedia data Wang, Xiao Meng Mao Kezhi School of Electrical and Electronic Engineering DRNTU::Engineering Driven by the significant advancements in Computer vision technology and relevant applications such as Unmanned Aerial Vehicle (UAV) and robotics, scene classification has become a hot and challenging problem in nowadays. Scene classification means categorizing images into different categories according to the physical or semantic properties. Most of the existing scene classification algorithms are based on the low level features and have achieved success in their own field. However, due to the large uncertainties in the scenes used for classification, it is very hard to achieve a high accuracy for a scene classification technique. In this thesis, a new scene classification technique is proposed based on a combination of image and audio features. 3 stages are involved in this new technique. The first stage is image feature extraction, the main feature used is Scale Invariant Feature Transform (SIFT); the second stage is audio feature extraction, the feature used here is the Mel-Frequency Cepstrum Coefficients (MFCCs). The third stage is Fusion and Classification, which is combining the features extracted from both image and audio and classify the scenes using Support Vector Machine (SVM). Several experiments have been done to test the performance of this new technique. Based on the test results, the final accuracy on classifying the different sports activities can reach up to 85%. Bachelor of Engineering 2016-05-23T07:32:25Z 2016-05-23T07:32:25Z 2016 Final Year Project (FYP) http://hdl.handle.net/10356/67931 en Nanyang Technological University 76 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering
spellingShingle	DRNTU::Engineering Wang, Xiao Meng Semantics extraction from multimedia data
description	Driven by the significant advancements in Computer vision technology and relevant applications such as Unmanned Aerial Vehicle (UAV) and robotics, scene classification has become a hot and challenging problem in nowadays. Scene classification means categorizing images into different categories according to the physical or semantic properties. Most of the existing scene classification algorithms are based on the low level features and have achieved success in their own field. However, due to the large uncertainties in the scenes used for classification, it is very hard to achieve a high accuracy for a scene classification technique. In this thesis, a new scene classification technique is proposed based on a combination of image and audio features. 3 stages are involved in this new technique. The first stage is image feature extraction, the main feature used is Scale Invariant Feature Transform (SIFT); the second stage is audio feature extraction, the feature used here is the Mel-Frequency Cepstrum Coefficients (MFCCs). The third stage is Fusion and Classification, which is combining the features extracted from both image and audio and classify the scenes using Support Vector Machine (SVM). Several experiments have been done to test the performance of this new technique. Based on the test results, the final accuracy on classifying the different sports activities can reach up to 85%.
author2	Mao Kezhi
author_facet	Mao Kezhi Wang, Xiao Meng
format	Final Year Project
author	Wang, Xiao Meng
author_sort	Wang, Xiao Meng
title	Semantics extraction from multimedia data
title_short	Semantics extraction from multimedia data
title_full	Semantics extraction from multimedia data
title_fullStr	Semantics extraction from multimedia data
title_full_unstemmed	Semantics extraction from multimedia data
title_sort	semantics extraction from multimedia data
publishDate	2016
url	http://hdl.handle.net/10356/67931
_version_	1772825292177408000

Semantics extraction from multimedia data

Similar Items