A multimedia transcription system
With the advent of computing, a huge amount of data is being created everyday. Most of the dataare unstructured or semi-structured, and needs to be processed in order to derive meaning. For multimedia data (audio and video), a textual representation is often desirable, and there are two ways to obta...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/73077 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-73077 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-730772023-03-03T20:34:50Z A multimedia transcription system Nguyen, Huy Anh Chng Eng Siong School of Computer Science and Engineering DRNTU::Engineering With the advent of computing, a huge amount of data is being created everyday. Most of the dataare unstructured or semi-structured, and needs to be processed in order to derive meaning. For multimedia data (audio and video), a textual representation is often desirable, and there are two ways to obtain such a representation --- transcription and captioning. The two processes are well-defined pipelines of multiple components. However, for each component there are many existing implementations, but each having differentiated input and output formats, which makes it difficult to integrate to a pipeline. The pipeline itself is difficult to maintain, with any change/ upgrade to any component having a potential to break the pipeline. Furthermore, as the pipeline changes there is no mechanism to keep track of output versions; this capability is important for research purposes. This project proposes an integrated processing system performing transcription and captioning on a wide range of audio and video inputs --- single-file audio/ video as well as multi-channel audio recordings. The project aims to design a system architecture that allows for modularity and extensibility, keeps track of different component and output versions and performs robustly under many scenarios. The project incorporates Python ports of existing modules from various efforts of the Speech and Language Research Group in the School of Computer Science and Engineering, as well as new Python modules to realize the processing pipeline --- transcription, captioning and visualizations of transcripts and captions. The project would be evaluated on existing audio records of talk shows (Singapore's 93.8FM), video records (Singapore Parliament proceedings) and multi-channel recordings (a four-people conversation on Singapore Army). It achieves all the requirements and proves the usefulness of this project. Bachelor of Engineering (Computer Science) 2018-01-02T05:56:44Z 2018-01-02T05:56:44Z 2018 Final Year Project (FYP) http://hdl.handle.net/10356/73077 en Nanyang Technological University 54 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering |
spellingShingle |
DRNTU::Engineering Nguyen, Huy Anh A multimedia transcription system |
description |
With the advent of computing, a huge amount of data is being created everyday. Most of the dataare unstructured or semi-structured, and needs to be processed in order to derive meaning. For multimedia data (audio and video), a textual representation is often desirable, and there are two ways to obtain such a representation --- transcription and captioning. The two processes are well-defined pipelines of multiple components. However, for each component there are many existing implementations, but each having differentiated input and output formats, which makes it difficult to integrate to a pipeline. The pipeline itself is difficult to maintain, with any change/ upgrade to any component having a potential to break the pipeline. Furthermore, as the pipeline changes there is no mechanism to keep track of output versions; this capability is important for research purposes. This project proposes an integrated processing system performing transcription and captioning on a wide range of audio and video inputs --- single-file audio/ video as well as multi-channel audio recordings. The project aims to design a system architecture that allows for modularity and extensibility, keeps track of different component and output versions and performs robustly under many scenarios. The project incorporates Python ports of existing modules from various efforts of the Speech and Language Research Group in the School of Computer Science and Engineering, as well as new Python modules to realize the processing pipeline --- transcription, captioning and visualizations of transcripts and captions. The project would be evaluated on existing audio records of talk shows (Singapore's 93.8FM), video records (Singapore Parliament proceedings) and multi-channel recordings (a four-people conversation on Singapore Army). It achieves all the requirements and proves the usefulness of this project. |
author2 |
Chng Eng Siong |
author_facet |
Chng Eng Siong Nguyen, Huy Anh |
format |
Final Year Project |
author |
Nguyen, Huy Anh |
author_sort |
Nguyen, Huy Anh |
title |
A multimedia transcription system |
title_short |
A multimedia transcription system |
title_full |
A multimedia transcription system |
title_fullStr |
A multimedia transcription system |
title_full_unstemmed |
A multimedia transcription system |
title_sort |
multimedia transcription system |
publishDate |
2018 |
url |
http://hdl.handle.net/10356/73077 |
_version_ |
1759858378818977792 |