Deep learning methods for weakly supervised video temporal action localization
Deep Learning (DL) based method for analysing dynamic graphical data has been a vital part of emerging technologies. Video and image-based recommendation systems, smart capabilities on surveillance technologies, and smart sensors are a few examples of such technologies that are catalysed by DL. Howe...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/139935 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-139935 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1399352023-07-07T18:47:30Z Deep learning methods for weakly supervised video temporal action localization Adipraja Widjaja, Sergi Wen Bihan School of Electrical and Electronic Engineering Institute of High Performance Computing (IHPC) A*STAR bihan.wen@ntu.edu.sg Engineering::Computer science and engineering Engineering::Electrical and electronic engineering Deep Learning (DL) based method for analysing dynamic graphical data has been a vital part of emerging technologies. Video and image-based recommendation systems, smart capabilities on surveillance technologies, and smart sensors are a few examples of such technologies that are catalysed by DL. However, a growing concern is the increasingly complex annotation requirements for different tasks based on DL. One such task that we want to highlight is the video temporal action localization, which requires a multi-step approach on classifying and locating action instances in an untrimmed video. To build an effective video temporal action localization model, besides video datasets with only action labels, more comprehensive temporal annotation is also required. Unfortunately, this is not an accurate reflection of how video information is presented on the web where simple video tags may be used as action labels. Hence, weakly-supervised methods for temporal action localization quickly gained traction due to its minimal annotation requirement where only class action labels are needed for training. In this project, by aggregating and combining the merits of neural networks modules from past research works, a weakly-supervised temporal action localization method is proposed and developed. The theoretical basis on the design rationale of different neural network components is discussed and justified. Along with that, we will be studying the effectiveness of different neural network architectures for the weakly-supervised temporal action localization task. A comprehensive ablation study is done to compare different modules proposed by past works on weakly-supervised temporal action localization. Bachelor of Engineering (Electrical and Electronic Engineering) 2020-05-22T09:15:51Z 2020-05-22T09:15:51Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/139935 en A3274-191 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Engineering::Electrical and electronic engineering |
spellingShingle |
Engineering::Computer science and engineering Engineering::Electrical and electronic engineering Adipraja Widjaja, Sergi Deep learning methods for weakly supervised video temporal action localization |
description |
Deep Learning (DL) based method for analysing dynamic graphical data has been a vital part of emerging technologies. Video and image-based recommendation systems, smart capabilities on surveillance technologies, and smart sensors are a few examples of such technologies that are catalysed by DL. However, a growing concern is the increasingly complex annotation requirements for different tasks based on DL. One such task that we want to highlight is the video temporal action localization, which requires a multi-step approach on classifying and locating action instances in an untrimmed video. To build an effective video temporal action localization model, besides video datasets with only action labels, more comprehensive temporal annotation is also required. Unfortunately, this is not an accurate reflection of how video information is presented on the web where simple video tags may be used as action labels. Hence, weakly-supervised methods for temporal action localization quickly gained traction due to its minimal annotation requirement where only class action labels are needed for training. In this project, by aggregating and combining the merits of neural networks modules from past research works, a weakly-supervised temporal action localization method is proposed and developed. The theoretical basis on the design rationale of different neural network components is discussed and justified. Along with that, we will be studying the effectiveness of different neural network architectures for the weakly-supervised temporal action localization task. A comprehensive ablation study is done to compare different modules proposed by past works on weakly-supervised temporal action localization. |
author2 |
Wen Bihan |
author_facet |
Wen Bihan Adipraja Widjaja, Sergi |
format |
Final Year Project |
author |
Adipraja Widjaja, Sergi |
author_sort |
Adipraja Widjaja, Sergi |
title |
Deep learning methods for weakly supervised video temporal action localization |
title_short |
Deep learning methods for weakly supervised video temporal action localization |
title_full |
Deep learning methods for weakly supervised video temporal action localization |
title_fullStr |
Deep learning methods for weakly supervised video temporal action localization |
title_full_unstemmed |
Deep learning methods for weakly supervised video temporal action localization |
title_sort |
deep learning methods for weakly supervised video temporal action localization |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/139935 |
_version_ |
1772828291939434496 |