Interpreting models for video action recognition

Action recognition is the task of identifying human actions in videos. This has been a long standing challenge in Computer Vision. Earlier methods relied on hand-crafted features and traditional machine learning algorithms to solve this task. In the past decade or so, deep learning has replaced thes...

Full description

Saved in:
Bibliographic Details
Main Author: Daniel Wijaya
Other Authors: Chen Change Loy
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/148367
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-148367
record_format dspace
spelling sg-ntu-dr.10356-1483672021-05-01T12:43:59Z Interpreting models for video action recognition Daniel Wijaya Chen Change Loy School of Computer Science and Engineering Multimedia and Interacting Computing Lab (MICL) Dr. Davide Moltisanti ccloy@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Action recognition is the task of identifying human actions in videos. This has been a long standing challenge in Computer Vision. Earlier methods relied on hand-crafted features and traditional machine learning algorithms to solve this task. In the past decade or so, deep learning has replaced these early methods. Traditional machine learning models like decision trees are easier to be interpreted than complex deep neural networks. Deep learning has gained so much popularity in the early 2010s thanks to its ability to achieve wonderful feats in various complex tasks such as Action recognition task. However, due to the complex innerworkings of deep neural networks, the interpretability of these models has been more challenging than ever. In this project, a study is conducted to interpret the deep neural networks in Action recognition. To examine this, we perform network dissection on the model trained on the UCF-101 [1] dataset for action recognition tasks. The focus will be placed on systematically identifying the semantics of individual hidden units within the model, followed by understanding the role of the units in the model based on the visual concepts that are captured by them. Specifically, the change in accuracy of the network in classifying each action is analyzed when a unit is eliminated. This is to determine the importance of each unit for each action. The impact on the network’s accuracy when removing important and irrelevant units for each class will thus be discussed. It is found out that the network relies on salient objects or cues to classify the action. For example, in our experiment, the network relies on the surrounding objects such as carpet to detect the BabyCrawling action. Bachelor of Engineering (Computer Science) 2021-05-01T12:43:59Z 2021-05-01T12:43:59Z 2021 Final Year Project (FYP) Daniel Wijaya (2021). Interpreting models for video action recognition. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/148367 https://hdl.handle.net/10356/148367 en SCSE20-0402 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Daniel Wijaya
Interpreting models for video action recognition
description Action recognition is the task of identifying human actions in videos. This has been a long standing challenge in Computer Vision. Earlier methods relied on hand-crafted features and traditional machine learning algorithms to solve this task. In the past decade or so, deep learning has replaced these early methods. Traditional machine learning models like decision trees are easier to be interpreted than complex deep neural networks. Deep learning has gained so much popularity in the early 2010s thanks to its ability to achieve wonderful feats in various complex tasks such as Action recognition task. However, due to the complex innerworkings of deep neural networks, the interpretability of these models has been more challenging than ever. In this project, a study is conducted to interpret the deep neural networks in Action recognition. To examine this, we perform network dissection on the model trained on the UCF-101 [1] dataset for action recognition tasks. The focus will be placed on systematically identifying the semantics of individual hidden units within the model, followed by understanding the role of the units in the model based on the visual concepts that are captured by them. Specifically, the change in accuracy of the network in classifying each action is analyzed when a unit is eliminated. This is to determine the importance of each unit for each action. The impact on the network’s accuracy when removing important and irrelevant units for each class will thus be discussed. It is found out that the network relies on salient objects or cues to classify the action. For example, in our experiment, the network relies on the surrounding objects such as carpet to detect the BabyCrawling action.
author2 Chen Change Loy
author_facet Chen Change Loy
Daniel Wijaya
format Final Year Project
author Daniel Wijaya
author_sort Daniel Wijaya
title Interpreting models for video action recognition
title_short Interpreting models for video action recognition
title_full Interpreting models for video action recognition
title_fullStr Interpreting models for video action recognition
title_full_unstemmed Interpreting models for video action recognition
title_sort interpreting models for video action recognition
publisher Nanyang Technological University
publishDate 2021
url https://hdl.handle.net/10356/148367
_version_ 1698713677396443136