A multimedia transcription system

With the advent of computing, a huge amount of data is being created everyday. Most of the dataare unstructured or semi-structured, and needs to be processed in order to derive meaning. For multimedia data (audio and video), a textual representation is often desirable, and there are two ways to obta...

Full description

Saved in:

Bibliographic Details
Main Author:	Nguyen, Huy Anh
Other Authors:	Chng Eng Siong
Format:	Final Year Project
Language:	English
Published:	2018
Subjects:	DRNTU::Engineering
Online Access:	http://hdl.handle.net/10356/73077
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-73077
record_format	dspace
spelling	sg-ntu-dr.10356-730772023-03-03T20:34:50Z A multimedia transcription system Nguyen, Huy Anh Chng Eng Siong School of Computer Science and Engineering DRNTU::Engineering With the advent of computing, a huge amount of data is being created everyday. Most of the dataare unstructured or semi-structured, and needs to be processed in order to derive meaning. For multimedia data (audio and video), a textual representation is often desirable, and there are two ways to obtain such a representation --- transcription and captioning. The two processes are well-defined pipelines of multiple components. However, for each component there are many existing implementations, but each having differentiated input and output formats, which makes it difficult to integrate to a pipeline. The pipeline itself is difficult to maintain, with any change/ upgrade to any component having a potential to break the pipeline. Furthermore, as the pipeline changes there is no mechanism to keep track of output versions; this capability is important for research purposes. This project proposes an integrated processing system performing transcription and captioning on a wide range of audio and video inputs --- single-file audio/ video as well as multi-channel audio recordings. The project aims to design a system architecture that allows for modularity and extensibility, keeps track of different component and output versions and performs robustly under many scenarios. The project incorporates Python ports of existing modules from various efforts of the Speech and Language Research Group in the School of Computer Science and Engineering, as well as new Python modules to realize the processing pipeline --- transcription, captioning and visualizations of transcripts and captions. The project would be evaluated on existing audio records of talk shows (Singapore's 93.8FM), video records (Singapore Parliament proceedings) and multi-channel recordings (a four-people conversation on Singapore Army). It achieves all the requirements and proves the usefulness of this project. Bachelor of Engineering (Computer Science) 2018-01-02T05:56:44Z 2018-01-02T05:56:44Z 2018 Final Year Project (FYP) http://hdl.handle.net/10356/73077 en Nanyang Technological University 54 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering
spellingShingle	DRNTU::Engineering Nguyen, Huy Anh A multimedia transcription system
description	With the advent of computing, a huge amount of data is being created everyday. Most of the dataare unstructured or semi-structured, and needs to be processed in order to derive meaning. For multimedia data (audio and video), a textual representation is often desirable, and there are two ways to obtain such a representation --- transcription and captioning. The two processes are well-defined pipelines of multiple components. However, for each component there are many existing implementations, but each having differentiated input and output formats, which makes it difficult to integrate to a pipeline. The pipeline itself is difficult to maintain, with any change/ upgrade to any component having a potential to break the pipeline. Furthermore, as the pipeline changes there is no mechanism to keep track of output versions; this capability is important for research purposes. This project proposes an integrated processing system performing transcription and captioning on a wide range of audio and video inputs --- single-file audio/ video as well as multi-channel audio recordings. The project aims to design a system architecture that allows for modularity and extensibility, keeps track of different component and output versions and performs robustly under many scenarios. The project incorporates Python ports of existing modules from various efforts of the Speech and Language Research Group in the School of Computer Science and Engineering, as well as new Python modules to realize the processing pipeline --- transcription, captioning and visualizations of transcripts and captions. The project would be evaluated on existing audio records of talk shows (Singapore's 93.8FM), video records (Singapore Parliament proceedings) and multi-channel recordings (a four-people conversation on Singapore Army). It achieves all the requirements and proves the usefulness of this project.
author2	Chng Eng Siong
author_facet	Chng Eng Siong Nguyen, Huy Anh
format	Final Year Project
author	Nguyen, Huy Anh
author_sort	Nguyen, Huy Anh
title	A multimedia transcription system
title_short	A multimedia transcription system
title_full	A multimedia transcription system
title_fullStr	A multimedia transcription system
title_full_unstemmed	A multimedia transcription system
title_sort	multimedia transcription system
publishDate	2018
url	http://hdl.handle.net/10356/73077
_version_	1759858378818977792

A multimedia transcription system

Similar Items