Prediction of pedestrian trajectory with a moving camera using deep learning
Predicting the motion of pedestrian have wide range of applications like social-behavior understanding, autonomous system, modelling crowds motion and so on. In case of autonomous systems, like mobile robot, obtaining the foreseeable position of pedestrian allows the robot to make fast and safe deci...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/141139 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Predicting the motion of pedestrian have wide range of applications like social-behavior understanding, autonomous system, modelling crowds motion and so on. In case of autonomous systems, like mobile robot, obtaining the foreseeable position of pedestrian allows the robot to make fast and safe decisions, like stopping before hitting the pedestrian feet. Making such decision require knowledge of robot's position and that of pedestrians moving around it in an accurate manner. As the robot and pedestrian both can be dynamic, the problem of predicting the pedestrian position from moving robot become a more challenging task.
The pedestrian trajectory prediction problem is essentially a sequence prediction problem based on the past input sequence. Due to the recent success of the RNN (Recurrent Neural Networks) in sequence prediction, the RNN architecture becomes one of the best choices for solving this type of problem. However, RNN cannot handle long sequences, LSTM (a variant of RNN) is used to build the proposed trajectory prediction model. LSTM can typically produce higher accuracy when handling long sequences because of gating structures inside the LSTM cell. The model is then optimized for different hyperparameters, loss functions and data format. In the optimized model, velocity values (difference between two trajectory points) of pedestrian trajectory is used rather than coordinate values in contrast to the existing approach. Experiments on several datasets shows that the proposed approach achieves high accuracy when velocity information is used as model inputs.
After establishing the model, another issue is to extract motion trajectories using moving cameras in a constantly changing scene. Because the trajectories in pixel coordinates are very noisy, that look like a mass rather than line trajectories, the trajectory in pixel coordinates obtained under the moving camera is converted to world coordinates. In this process, the height of the pedestrians is assumed to be constant value of 1.7m. This technique may not be accurate, the purpose of using constant height is to transform the location of pedestrian from image coordinates to a fix reference coordinate system. Since the proposed motion model is independent of the pedestrian position, require only velocity as input, now the prediction can be done in the reference coordinates system. The reference coordinate is actually the camera coordinate which gives the relative position of the pedestrian from the mobile robot. Moreover, the camera coordinates can be inverse projected back to image coordinates using camera intrinsic parameters as well as to world coordinates from robot odometry information.
Visual experiments show that real-time prediction in world coordinates can be achieved with small margin of error from visual images only. This approach thus omits the necessary of range sensors like LIDAR which are computationally expensive. |
---|