Determining human intention in videos I

Human intention is a temporal sequence of human actions to achieve a goal. Determining human intentions is highly useful in many situations. It can enable better human-robot collaboration whereby robots are required to help human users. It is also useful in analysing human behaviours in dynamic env...

全面介紹

Saved in:
書目詳細資料
主要作者: Hoong, Jia Qi
其他作者: Cham Tat Jen
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2022
主題:
在線閱讀:https://hdl.handle.net/10356/158063
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:Human intention is a temporal sequence of human actions to achieve a goal. Determining human intentions is highly useful in many situations. It can enable better human-robot collaboration whereby robots are required to help human users. It is also useful in analysing human behaviours in dynamic environment, such as monitoring mobile patients in hospitals or monitoring athletes in tournaments. In this work, we focus on predicting future action from past observations in egocentric videos. This is known as egocentric action anticipation. Egocentric videos are videos that record the human actions in a first-person perspective. This research shall analyse a deep learning framework proposed by Furnari and Farinella [1]. The framework is a multimodal network consisting of (1) Rolling-Unrolling LSTM models for anticipating actions from egocentric videos using multi-modal features and (2) a Modality ATTention (MATT) mechanism for fusing multi-modal predictions. Moreover, the multimodal network shall be extended on other modalities, specifically using monocular depth for egocentric action anticipation.