Interpreting models for video action recognition

Action recognition is the task of identifying human actions in videos. This has been a long standing challenge in Computer Vision. Earlier methods relied on hand-crafted features and traditional machine learning algorithms to solve this task. In the past decade or so, deep learning has replaced thes...

Full description

Saved in:

Bibliographic Details
Main Author:	Daniel Wijaya
Other Authors:	Chen Change Loy
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Online Access:	https://hdl.handle.net/10356/148367
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-148367
record_format	dspace
spelling	sg-ntu-dr.10356-1483672021-05-01T12:43:59Z Interpreting models for video action recognition Daniel Wijaya Chen Change Loy School of Computer Science and Engineering Multimedia and Interacting Computing Lab (MICL) Dr. Davide Moltisanti ccloy@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Action recognition is the task of identifying human actions in videos. This has been a long standing challenge in Computer Vision. Earlier methods relied on hand-crafted features and traditional machine learning algorithms to solve this task. In the past decade or so, deep learning has replaced these early methods. Traditional machine learning models like decision trees are easier to be interpreted than complex deep neural networks. Deep learning has gained so much popularity in the early 2010s thanks to its ability to achieve wonderful feats in various complex tasks such as Action recognition task. However, due to the complex innerworkings of deep neural networks, the interpretability of these models has been more challenging than ever. In this project, a study is conducted to interpret the deep neural networks in Action recognition. To examine this, we perform network dissection on the model trained on the UCF-101 [1] dataset for action recognition tasks. The focus will be placed on systematically identifying the semantics of individual hidden units within the model, followed by understanding the role of the units in the model based on the visual concepts that are captured by them. Specifically, the change in accuracy of the network in classifying each action is analyzed when a unit is eliminated. This is to determine the importance of each unit for each action. The impact on the network’s accuracy when removing important and irrelevant units for each class will thus be discussed. It is found out that the network relies on salient objects or cues to classify the action. For example, in our experiment, the network relies on the surrounding objects such as carpet to detect the BabyCrawling action. Bachelor of Engineering (Computer Science) 2021-05-01T12:43:59Z 2021-05-01T12:43:59Z 2021 Final Year Project (FYP) Daniel Wijaya (2021). Interpreting models for video action recognition. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/148367 https://hdl.handle.net/10356/148367 en SCSE20-0402 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Daniel Wijaya Interpreting models for video action recognition
description	Action recognition is the task of identifying human actions in videos. This has been a long standing challenge in Computer Vision. Earlier methods relied on hand-crafted features and traditional machine learning algorithms to solve this task. In the past decade or so, deep learning has replaced these early methods. Traditional machine learning models like decision trees are easier to be interpreted than complex deep neural networks. Deep learning has gained so much popularity in the early 2010s thanks to its ability to achieve wonderful feats in various complex tasks such as Action recognition task. However, due to the complex innerworkings of deep neural networks, the interpretability of these models has been more challenging than ever. In this project, a study is conducted to interpret the deep neural networks in Action recognition. To examine this, we perform network dissection on the model trained on the UCF-101 [1] dataset for action recognition tasks. The focus will be placed on systematically identifying the semantics of individual hidden units within the model, followed by understanding the role of the units in the model based on the visual concepts that are captured by them. Specifically, the change in accuracy of the network in classifying each action is analyzed when a unit is eliminated. This is to determine the importance of each unit for each action. The impact on the network’s accuracy when removing important and irrelevant units for each class will thus be discussed. It is found out that the network relies on salient objects or cues to classify the action. For example, in our experiment, the network relies on the surrounding objects such as carpet to detect the BabyCrawling action.
author2	Chen Change Loy
author_facet	Chen Change Loy Daniel Wijaya
format	Final Year Project
author	Daniel Wijaya
author_sort	Daniel Wijaya
title	Interpreting models for video action recognition
title_short	Interpreting models for video action recognition
title_full	Interpreting models for video action recognition
title_fullStr	Interpreting models for video action recognition
title_full_unstemmed	Interpreting models for video action recognition
title_sort	interpreting models for video action recognition
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/148367
_version_	1698713677396443136

Interpreting models for video action recognition

Similar Items