Multimodal knowledge-based analysis in multimedia event detection

Multimedia Event Detection (MED) is a multimedia retrieval task with the goal of finding videos of a particular event in a large-scale Internet video archive, given example videos and text descriptions. We focus on the multimodal knowledge-based analysis in MED where we utilize meaningful and semant...

Full description

Saved in:
Bibliographic Details
Main Authors: Younessian, Ehsan., Mitamura, Teruko., Hauptmann, Alexander.
Other Authors: School of Computer Engineering
Format: Conference or Workshop Item
Language:English
Published: 2013
Online Access:https://hdl.handle.net/10356/84248
http://hdl.handle.net/10220/12649
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-84248
record_format dspace
spelling sg-ntu-dr.10356-842482020-05-28T07:18:16Z Multimodal knowledge-based analysis in multimedia event detection Younessian, Ehsan. Mitamura, Teruko. Hauptmann, Alexander. School of Computer Engineering International Conference on Multimedia Retrieval (2nd : 2012) Multimedia Event Detection (MED) is a multimedia retrieval task with the goal of finding videos of a particular event in a large-scale Internet video archive, given example videos and text descriptions. We focus on the multimodal knowledge-based analysis in MED where we utilize meaningful and semantic features such as Automatic Speech Recognition (ASR) transcripts, acoustic concept indexing (i.e. 42 acoustic concepts) and visual semantic indexing (i.e. 346 visual concepts) to characterize videos in archive. We study two scenarios where we either do or do not use the provided example videos. In the former, we propose a novel Adaptive Semantic Similarity (ASS) to measure textual similarity between ASR transcripts of videos. We also incorporate acoustic concept indexing and classification to retrieve test videos, specially with too few spoken words. In the latter 'ad-hoc' scenario where we do not have any example video, we use only the event kit description to retrieve test videos ASR transcripts and visual semantics. We also propose an event-specific fusion scheme to combine textual and visual retrieval outputs. Our results show the effectiveness of the proposed ASS and acoustic concept indexing methods and their complimentary role. We also conduct a set of experiments to assess the proposed framework for the 'ad-hoc' scenario. 2013-07-31T07:25:33Z 2019-12-06T15:41:16Z 2013-07-31T07:25:33Z 2019-12-06T15:41:16Z 2012 2012 Conference Paper Younessian, E., Mitamura, T.,& Hauptmann, A. (2012). Multimodal knowledge-based analysis in multimedia event detection. Proceedings of the 2nd ACM International Conference on Multimedia Retrieval - ICMR '12. https://hdl.handle.net/10356/84248 http://hdl.handle.net/10220/12649 10.1145/2324796.2324855 en
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
description Multimedia Event Detection (MED) is a multimedia retrieval task with the goal of finding videos of a particular event in a large-scale Internet video archive, given example videos and text descriptions. We focus on the multimodal knowledge-based analysis in MED where we utilize meaningful and semantic features such as Automatic Speech Recognition (ASR) transcripts, acoustic concept indexing (i.e. 42 acoustic concepts) and visual semantic indexing (i.e. 346 visual concepts) to characterize videos in archive. We study two scenarios where we either do or do not use the provided example videos. In the former, we propose a novel Adaptive Semantic Similarity (ASS) to measure textual similarity between ASR transcripts of videos. We also incorporate acoustic concept indexing and classification to retrieve test videos, specially with too few spoken words. In the latter 'ad-hoc' scenario where we do not have any example video, we use only the event kit description to retrieve test videos ASR transcripts and visual semantics. We also propose an event-specific fusion scheme to combine textual and visual retrieval outputs. Our results show the effectiveness of the proposed ASS and acoustic concept indexing methods and their complimentary role. We also conduct a set of experiments to assess the proposed framework for the 'ad-hoc' scenario.
author2 School of Computer Engineering
author_facet School of Computer Engineering
Younessian, Ehsan.
Mitamura, Teruko.
Hauptmann, Alexander.
format Conference or Workshop Item
author Younessian, Ehsan.
Mitamura, Teruko.
Hauptmann, Alexander.
spellingShingle Younessian, Ehsan.
Mitamura, Teruko.
Hauptmann, Alexander.
Multimodal knowledge-based analysis in multimedia event detection
author_sort Younessian, Ehsan.
title Multimodal knowledge-based analysis in multimedia event detection
title_short Multimodal knowledge-based analysis in multimedia event detection
title_full Multimodal knowledge-based analysis in multimedia event detection
title_fullStr Multimodal knowledge-based analysis in multimedia event detection
title_full_unstemmed Multimodal knowledge-based analysis in multimedia event detection
title_sort multimodal knowledge-based analysis in multimedia event detection
publishDate 2013
url https://hdl.handle.net/10356/84248
http://hdl.handle.net/10220/12649
_version_ 1681057124517412864