Multimedia indexing and retrieval with lucene and concept annotations processing

The rise of interest in automated speech recognition technology has paved ways for various new applications in recent years. Speeches from seminars, conferences and lectures are now being able to be translated into text format automatically. Once the speech data is retrieved from the multimedia lect...

Full description

Saved in:
Bibliographic Details
Main Author: Kyaw, Zin Tun
Other Authors: Chng Eng Siong
Format: Final Year Project
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/64750
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-64750
record_format dspace
spelling sg-ntu-dr.10356-647502023-03-03T20:37:27Z Multimedia indexing and retrieval with lucene and concept annotations processing Kyaw, Zin Tun Chng Eng Siong School of Computer Engineering Emerging Research Lab DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications The rise of interest in automated speech recognition technology has paved ways for various new applications in recent years. Speeches from seminars, conferences and lectures are now being able to be translated into text format automatically. Once the speech data is retrieved from the multimedia lectures files in the form of transcription, the data has to be further processed to display in human friendly format. This calls for the need of development of content based search on lecture video and streaming web interface to enhance the experience higher education and research study. The goal of this project is to develop a client-server web interface (LECTS SEARCH) that can facilitate viewing and searching of keywords or concepts within the content of lecture videos simultaneously. A single collection of raw speech data can potentially contain up to millions of words and, the storing and retrieving of the relevant data can be challenging. Hence, the efficient indexing mechanism to maintain the data is required. This thesis will focus on archiving and retrieval of the speech data by performing Inverted Indexing on keywords so that the data can be readily available for further uses such as keyword searching. This thesis also covers the storage and retrieval of concept-keywords using tree representation data structure, Extended Markup Language. Inverted Indexing is one of the widely used multimedia indexing techniques where it looks for unique terms within the sentences of the documents. Each unique term can be used to effectively determine the document correspond to it and, in this way, the speed of information retrieval has been greatly improved. Currently, the speech data to be indexed are primarily from the MIT lectures on Aerospace (27MB/ ~330,616 words) and Signal Processing (12MB / ~150,280 words) Domain. From the experiments, the time taken to search keywords for each domain ranges from 0.4 to 0.9 seconds. However, there is an issue with the increased retrieval time for documents when keywords search on multiple collections are made. Bachelor of Engineering (Computer Engineering) 2015-06-02T07:28:49Z 2015-06-02T07:28:49Z 2015 2015 Final Year Project (FYP) http://hdl.handle.net/10356/64750 en Nanyang Technological University 62 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications
Kyaw, Zin Tun
Multimedia indexing and retrieval with lucene and concept annotations processing
description The rise of interest in automated speech recognition technology has paved ways for various new applications in recent years. Speeches from seminars, conferences and lectures are now being able to be translated into text format automatically. Once the speech data is retrieved from the multimedia lectures files in the form of transcription, the data has to be further processed to display in human friendly format. This calls for the need of development of content based search on lecture video and streaming web interface to enhance the experience higher education and research study. The goal of this project is to develop a client-server web interface (LECTS SEARCH) that can facilitate viewing and searching of keywords or concepts within the content of lecture videos simultaneously. A single collection of raw speech data can potentially contain up to millions of words and, the storing and retrieving of the relevant data can be challenging. Hence, the efficient indexing mechanism to maintain the data is required. This thesis will focus on archiving and retrieval of the speech data by performing Inverted Indexing on keywords so that the data can be readily available for further uses such as keyword searching. This thesis also covers the storage and retrieval of concept-keywords using tree representation data structure, Extended Markup Language. Inverted Indexing is one of the widely used multimedia indexing techniques where it looks for unique terms within the sentences of the documents. Each unique term can be used to effectively determine the document correspond to it and, in this way, the speed of information retrieval has been greatly improved. Currently, the speech data to be indexed are primarily from the MIT lectures on Aerospace (27MB/ ~330,616 words) and Signal Processing (12MB / ~150,280 words) Domain. From the experiments, the time taken to search keywords for each domain ranges from 0.4 to 0.9 seconds. However, there is an issue with the increased retrieval time for documents when keywords search on multiple collections are made.
author2 Chng Eng Siong
author_facet Chng Eng Siong
Kyaw, Zin Tun
format Final Year Project
author Kyaw, Zin Tun
author_sort Kyaw, Zin Tun
title Multimedia indexing and retrieval with lucene and concept annotations processing
title_short Multimedia indexing and retrieval with lucene and concept annotations processing
title_full Multimedia indexing and retrieval with lucene and concept annotations processing
title_fullStr Multimedia indexing and retrieval with lucene and concept annotations processing
title_full_unstemmed Multimedia indexing and retrieval with lucene and concept annotations processing
title_sort multimedia indexing and retrieval with lucene and concept annotations processing
publishDate 2015
url http://hdl.handle.net/10356/64750
_version_ 1759855880079147008