Context models for pedestrian intention prediction by factored latent-dynamic conditional random fields

Smooth handling of pedestrian interactions is a key requirement for Autonomous Vehicles (AV) and Advanced Driver Assistance Systems (ADAS). Such systems call for early and accurate prediction of a pedestrian's crossing/not-crossing behaviour in front of the vehicle. Existing approaches to pedes...

Full description

Saved in:
Bibliographic Details
Main Author: Satyajit Neogi
Other Authors: Justin Dauwels
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/143222
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Smooth handling of pedestrian interactions is a key requirement for Autonomous Vehicles (AV) and Advanced Driver Assistance Systems (ADAS). Such systems call for early and accurate prediction of a pedestrian's crossing/not-crossing behaviour in front of the vehicle. Existing approaches to pedestrian behaviour prediction make use of pedestrian motion, his/her location in a scene and static context variables such as traffic lights, zebra crossings etc. We stress on the necessity of early prediction for smooth operation of such systems. We introduce the influence of vehicle interactions on pedestrian intention for this purpose. In this thesis, we show a discernible advance in prediction time aided by the inclusion of such vehicle interaction context. We apply our methods to two public datasets, viz., Daimler dataset and JAAD dataset. We also contribute two datasets towards pedestrian behaviour prediction research, viz., NTU dataset and Little India dataset and apply our methods on these datasets. While the existing best system predicts pedestrian stopping behaviour with 70% accuracy 0.38 seconds before the actual events, our system achieves such accuracy across multiple datasets at least 0.9 seconds on an average before the actual events. We formulate the pedestrian behaviour prediction problem as a sequence labeling task. Conditional Random Fields (CRF) are frequently applied for labeling and segmenting sequence data. In the existing literature, hidden variables have been introduced in a labeled CRF structure in order to model the latent dynamics within class labels, thus improving the labeling performance. Such a model is known as Latent-Dynamic CRF (LDCRF). We propose a generalization of LDCRF, called Factored LDCRF (FLDCRF), a structure that allows multiple latent dynamics of the class labels to interact with each other. Including such latent-dynamic interactions leads to improved performance on single-label and multi-label sequence modeling tasks. We validate our FLDCRF models on standard single-label and multi-label sequence tagging experiments across two different datasets - UCI gesture phase data and UCI opportunity data, before proceeding to apply it for pedestrian behaviour prediction. FLDCRF outperforms all state-of-the-art sequence models, i.e., CRF, LDCRF, LSTM, LSTM-CRF, Factorial CRF, Coupled CRF and a multi-label LSTM model in all our experiments. In addition, FLDCRF offers easier model selection and is more consistent across validation and test data than LSTM models. FLDCRF is also much faster to train compared to LSTM, even without a GPU. FLDCRF outshines the best LSTM model by 4% in terms of F1-score on a single-label task on the UCI gesture phase data and outperforms LSTM by ~2% (F1-score) on average on the multi-label sequence tagging experiment on UCI opportunity data. FLDCRF models also outperform LSTM models on the pedestrian behaviour prediction task across multiple datasets. Interacting latent dynamics in a FLDCRF can be exploited for modeling multi-agent interactions in a social environment. FLDCRF accommodates interacting discrete latent state spaces in its structure. The same idea can be extended to interacting heterogeneous (discrete and continuous) state space models.