Context models for pedestrian intention prediction by factored latent-dynamic conditional random fields

Smooth handling of pedestrian interactions is a key requirement for Autonomous Vehicles (AV) and Advanced Driver Assistance Systems (ADAS). Such systems call for early and accurate prediction of a pedestrian's crossing/not-crossing behaviour in front of the vehicle. Existing approaches to pedes...

Full description

Saved in:
Bibliographic Details
Main Author: Satyajit Neogi
Other Authors: Justin Dauwels
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/143222
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-143222
record_format dspace
spelling sg-ntu-dr.10356-1432222023-07-04T17:20:52Z Context models for pedestrian intention prediction by factored latent-dynamic conditional random fields Satyajit Neogi Justin Dauwels School of Electrical and Electronic Engineering JDAUWELS@ntu.edu.sg Engineering::Electrical and electronic engineering Smooth handling of pedestrian interactions is a key requirement for Autonomous Vehicles (AV) and Advanced Driver Assistance Systems (ADAS). Such systems call for early and accurate prediction of a pedestrian's crossing/not-crossing behaviour in front of the vehicle. Existing approaches to pedestrian behaviour prediction make use of pedestrian motion, his/her location in a scene and static context variables such as traffic lights, zebra crossings etc. We stress on the necessity of early prediction for smooth operation of such systems. We introduce the influence of vehicle interactions on pedestrian intention for this purpose. In this thesis, we show a discernible advance in prediction time aided by the inclusion of such vehicle interaction context. We apply our methods to two public datasets, viz., Daimler dataset and JAAD dataset. We also contribute two datasets towards pedestrian behaviour prediction research, viz., NTU dataset and Little India dataset and apply our methods on these datasets. While the existing best system predicts pedestrian stopping behaviour with 70% accuracy 0.38 seconds before the actual events, our system achieves such accuracy across multiple datasets at least 0.9 seconds on an average before the actual events. We formulate the pedestrian behaviour prediction problem as a sequence labeling task. Conditional Random Fields (CRF) are frequently applied for labeling and segmenting sequence data. In the existing literature, hidden variables have been introduced in a labeled CRF structure in order to model the latent dynamics within class labels, thus improving the labeling performance. Such a model is known as Latent-Dynamic CRF (LDCRF). We propose a generalization of LDCRF, called Factored LDCRF (FLDCRF), a structure that allows multiple latent dynamics of the class labels to interact with each other. Including such latent-dynamic interactions leads to improved performance on single-label and multi-label sequence modeling tasks. We validate our FLDCRF models on standard single-label and multi-label sequence tagging experiments across two different datasets - UCI gesture phase data and UCI opportunity data, before proceeding to apply it for pedestrian behaviour prediction. FLDCRF outperforms all state-of-the-art sequence models, i.e., CRF, LDCRF, LSTM, LSTM-CRF, Factorial CRF, Coupled CRF and a multi-label LSTM model in all our experiments. In addition, FLDCRF offers easier model selection and is more consistent across validation and test data than LSTM models. FLDCRF is also much faster to train compared to LSTM, even without a GPU. FLDCRF outshines the best LSTM model by 4% in terms of F1-score on a single-label task on the UCI gesture phase data and outperforms LSTM by ~2% (F1-score) on average on the multi-label sequence tagging experiment on UCI opportunity data. FLDCRF models also outperform LSTM models on the pedestrian behaviour prediction task across multiple datasets. Interacting latent dynamics in a FLDCRF can be exploited for modeling multi-agent interactions in a social environment. FLDCRF accommodates interacting discrete latent state spaces in its structure. The same idea can be extended to interacting heterogeneous (discrete and continuous) state space models. Doctor of Philosophy 2020-08-14T01:54:24Z 2020-08-14T01:54:24Z 2020 Thesis-Doctor of Philosophy Satyajit Neogi. (2020). Context models for pedestrian intention prediction by factored latent-dynamic conditional random fields. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/143222 10.32657/10356/143222 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Satyajit Neogi
Context models for pedestrian intention prediction by factored latent-dynamic conditional random fields
description Smooth handling of pedestrian interactions is a key requirement for Autonomous Vehicles (AV) and Advanced Driver Assistance Systems (ADAS). Such systems call for early and accurate prediction of a pedestrian's crossing/not-crossing behaviour in front of the vehicle. Existing approaches to pedestrian behaviour prediction make use of pedestrian motion, his/her location in a scene and static context variables such as traffic lights, zebra crossings etc. We stress on the necessity of early prediction for smooth operation of such systems. We introduce the influence of vehicle interactions on pedestrian intention for this purpose. In this thesis, we show a discernible advance in prediction time aided by the inclusion of such vehicle interaction context. We apply our methods to two public datasets, viz., Daimler dataset and JAAD dataset. We also contribute two datasets towards pedestrian behaviour prediction research, viz., NTU dataset and Little India dataset and apply our methods on these datasets. While the existing best system predicts pedestrian stopping behaviour with 70% accuracy 0.38 seconds before the actual events, our system achieves such accuracy across multiple datasets at least 0.9 seconds on an average before the actual events. We formulate the pedestrian behaviour prediction problem as a sequence labeling task. Conditional Random Fields (CRF) are frequently applied for labeling and segmenting sequence data. In the existing literature, hidden variables have been introduced in a labeled CRF structure in order to model the latent dynamics within class labels, thus improving the labeling performance. Such a model is known as Latent-Dynamic CRF (LDCRF). We propose a generalization of LDCRF, called Factored LDCRF (FLDCRF), a structure that allows multiple latent dynamics of the class labels to interact with each other. Including such latent-dynamic interactions leads to improved performance on single-label and multi-label sequence modeling tasks. We validate our FLDCRF models on standard single-label and multi-label sequence tagging experiments across two different datasets - UCI gesture phase data and UCI opportunity data, before proceeding to apply it for pedestrian behaviour prediction. FLDCRF outperforms all state-of-the-art sequence models, i.e., CRF, LDCRF, LSTM, LSTM-CRF, Factorial CRF, Coupled CRF and a multi-label LSTM model in all our experiments. In addition, FLDCRF offers easier model selection and is more consistent across validation and test data than LSTM models. FLDCRF is also much faster to train compared to LSTM, even without a GPU. FLDCRF outshines the best LSTM model by 4% in terms of F1-score on a single-label task on the UCI gesture phase data and outperforms LSTM by ~2% (F1-score) on average on the multi-label sequence tagging experiment on UCI opportunity data. FLDCRF models also outperform LSTM models on the pedestrian behaviour prediction task across multiple datasets. Interacting latent dynamics in a FLDCRF can be exploited for modeling multi-agent interactions in a social environment. FLDCRF accommodates interacting discrete latent state spaces in its structure. The same idea can be extended to interacting heterogeneous (discrete and continuous) state space models.
author2 Justin Dauwels
author_facet Justin Dauwels
Satyajit Neogi
format Thesis-Doctor of Philosophy
author Satyajit Neogi
author_sort Satyajit Neogi
title Context models for pedestrian intention prediction by factored latent-dynamic conditional random fields
title_short Context models for pedestrian intention prediction by factored latent-dynamic conditional random fields
title_full Context models for pedestrian intention prediction by factored latent-dynamic conditional random fields
title_fullStr Context models for pedestrian intention prediction by factored latent-dynamic conditional random fields
title_full_unstemmed Context models for pedestrian intention prediction by factored latent-dynamic conditional random fields
title_sort context models for pedestrian intention prediction by factored latent-dynamic conditional random fields
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/143222
_version_ 1772827650930245632