Interpreting models for video action recognition

Action recognition is the task of identifying human actions in videos. This has been a long standing challenge in Computer Vision. Earlier methods relied on hand-crafted features and traditional machine learning algorithms to solve this task. In the past decade or so, deep learning has replaced thes...

全面介紹

Saved in:

書目詳細資料
主要作者:	Daniel Wijaya
其他作者:	Chen Change Loy
格式:	Final Year Project
語言:	English
出版:	Nanyang Technological University 2021
主題:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
在線閱讀:	https://hdl.handle.net/10356/148367
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Nanyang Technological University
語言:	English

實物特徵
總結:	Action recognition is the task of identifying human actions in videos. This has been a long standing challenge in Computer Vision. Earlier methods relied on hand-crafted features and traditional machine learning algorithms to solve this task. In the past decade or so, deep learning has replaced these early methods. Traditional machine learning models like decision trees are easier to be interpreted than complex deep neural networks. Deep learning has gained so much popularity in the early 2010s thanks to its ability to achieve wonderful feats in various complex tasks such as Action recognition task. However, due to the complex innerworkings of deep neural networks, the interpretability of these models has been more challenging than ever. In this project, a study is conducted to interpret the deep neural networks in Action recognition. To examine this, we perform network dissection on the model trained on the UCF-101 [1] dataset for action recognition tasks. The focus will be placed on systematically identifying the semantics of individual hidden units within the model, followed by understanding the role of the units in the model based on the visual concepts that are captured by them. Specifically, the change in accuracy of the network in classifying each action is analyzed when a unit is eliminated. This is to determine the importance of each unit for each action. The impact on the network’s accuracy when removing important and irrelevant units for each class will thus be discussed. It is found out that the network relies on salient objects or cues to classify the action. For example, in our experiment, the network relies on the surrounding objects such as carpet to detect the BabyCrawling action.

Interpreting models for video action recognition

相似書籍