Deep learning methods for weakly supervised video temporal action localization

Deep Learning (DL) based method for analysing dynamic graphical data has been a vital part of emerging technologies. Video and image-based recommendation systems, smart capabilities on surveillance technologies, and smart sensors are a few examples of such technologies that are catalysed by DL. Howe...

Full description

Saved in:
Bibliographic Details
Main Author: Adipraja Widjaja, Sergi
Other Authors: Wen Bihan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/139935
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Deep Learning (DL) based method for analysing dynamic graphical data has been a vital part of emerging technologies. Video and image-based recommendation systems, smart capabilities on surveillance technologies, and smart sensors are a few examples of such technologies that are catalysed by DL. However, a growing concern is the increasingly complex annotation requirements for different tasks based on DL. One such task that we want to highlight is the video temporal action localization, which requires a multi-step approach on classifying and locating action instances in an untrimmed video. To build an effective video temporal action localization model, besides video datasets with only action labels, more comprehensive temporal annotation is also required. Unfortunately, this is not an accurate reflection of how video information is presented on the web where simple video tags may be used as action labels. Hence, weakly-supervised methods for temporal action localization quickly gained traction due to its minimal annotation requirement where only class action labels are needed for training. In this project, by aggregating and combining the merits of neural networks modules from past research works, a weakly-supervised temporal action localization method is proposed and developed. The theoretical basis on the design rationale of different neural network components is discussed and justified. Along with that, we will be studying the effectiveness of different neural network architectures for the weakly-supervised temporal action localization task. A comprehensive ablation study is done to compare different modules proposed by past works on weakly-supervised temporal action localization.