Deep learning methods for weakly supervised video temporal action localization

Deep Learning (DL) based method for analysing dynamic graphical data has been a vital part of emerging technologies. Video and image-based recommendation systems, smart capabilities on surveillance technologies, and smart sensors are a few examples of such technologies that are catalysed by DL. Howe...

Full description

Saved in:
Bibliographic Details
Main Author: Adipraja Widjaja, Sergi
Other Authors: Wen Bihan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/139935
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-139935
record_format dspace
spelling sg-ntu-dr.10356-1399352023-07-07T18:47:30Z Deep learning methods for weakly supervised video temporal action localization Adipraja Widjaja, Sergi Wen Bihan School of Electrical and Electronic Engineering Institute of High Performance Computing (IHPC) A*STAR bihan.wen@ntu.edu.sg Engineering::Computer science and engineering Engineering::Electrical and electronic engineering Deep Learning (DL) based method for analysing dynamic graphical data has been a vital part of emerging technologies. Video and image-based recommendation systems, smart capabilities on surveillance technologies, and smart sensors are a few examples of such technologies that are catalysed by DL. However, a growing concern is the increasingly complex annotation requirements for different tasks based on DL. One such task that we want to highlight is the video temporal action localization, which requires a multi-step approach on classifying and locating action instances in an untrimmed video. To build an effective video temporal action localization model, besides video datasets with only action labels, more comprehensive temporal annotation is also required. Unfortunately, this is not an accurate reflection of how video information is presented on the web where simple video tags may be used as action labels. Hence, weakly-supervised methods for temporal action localization quickly gained traction due to its minimal annotation requirement where only class action labels are needed for training. In this project, by aggregating and combining the merits of neural networks modules from past research works, a weakly-supervised temporal action localization method is proposed and developed. The theoretical basis on the design rationale of different neural network components is discussed and justified. Along with that, we will be studying the effectiveness of different neural network architectures for the weakly-supervised temporal action localization task. A comprehensive ablation study is done to compare different modules proposed by past works on weakly-supervised temporal action localization. Bachelor of Engineering (Electrical and Electronic Engineering) 2020-05-22T09:15:51Z 2020-05-22T09:15:51Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/139935 en A3274-191 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Engineering::Electrical and electronic engineering
spellingShingle Engineering::Computer science and engineering
Engineering::Electrical and electronic engineering
Adipraja Widjaja, Sergi
Deep learning methods for weakly supervised video temporal action localization
description Deep Learning (DL) based method for analysing dynamic graphical data has been a vital part of emerging technologies. Video and image-based recommendation systems, smart capabilities on surveillance technologies, and smart sensors are a few examples of such technologies that are catalysed by DL. However, a growing concern is the increasingly complex annotation requirements for different tasks based on DL. One such task that we want to highlight is the video temporal action localization, which requires a multi-step approach on classifying and locating action instances in an untrimmed video. To build an effective video temporal action localization model, besides video datasets with only action labels, more comprehensive temporal annotation is also required. Unfortunately, this is not an accurate reflection of how video information is presented on the web where simple video tags may be used as action labels. Hence, weakly-supervised methods for temporal action localization quickly gained traction due to its minimal annotation requirement where only class action labels are needed for training. In this project, by aggregating and combining the merits of neural networks modules from past research works, a weakly-supervised temporal action localization method is proposed and developed. The theoretical basis on the design rationale of different neural network components is discussed and justified. Along with that, we will be studying the effectiveness of different neural network architectures for the weakly-supervised temporal action localization task. A comprehensive ablation study is done to compare different modules proposed by past works on weakly-supervised temporal action localization.
author2 Wen Bihan
author_facet Wen Bihan
Adipraja Widjaja, Sergi
format Final Year Project
author Adipraja Widjaja, Sergi
author_sort Adipraja Widjaja, Sergi
title Deep learning methods for weakly supervised video temporal action localization
title_short Deep learning methods for weakly supervised video temporal action localization
title_full Deep learning methods for weakly supervised video temporal action localization
title_fullStr Deep learning methods for weakly supervised video temporal action localization
title_full_unstemmed Deep learning methods for weakly supervised video temporal action localization
title_sort deep learning methods for weakly supervised video temporal action localization
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/139935
_version_ 1772828291939434496