Multi-stream social-aware transformers for deterministic trajectory prediction

With the development of artificial intelligence technology, intelligent robots are being used more widely in daily life. For any delivery robot operating in crowded environments, accurate and fast pedestrian trajectory prediction is the basis of autonomous tasks and poses considerable challenges. (...

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Xun
Other Authors: Wang Dan Wei
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/172974
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:With the development of artificial intelligence technology, intelligent robots are being used more widely in daily life. For any delivery robot operating in crowded environments, accurate and fast pedestrian trajectory prediction is the basis of autonomous tasks and poses considerable challenges. (1) For the pedestrian trajectory prediction task, most previous works use probabilistic generative models (such as CVAE/Diffusion) to model the problem and use evaluation metrics like the best result out of 20 samples to measure model accuracy. This has a considerable gap from actual deployment applications. In this work, the task is modeled as a seq2seq translation model, outputting only one accurate prediction, which is more amenable to real-world deployment while also reducing model complexity. (2) The difficulty of this task lies in its inherent spatio-temporal and social dimensions. Simply modeling the temporal dimension alone would miss interactions between agents. Most solutions alternate information exchange across the two dimensions and achieve decent results, but this may lead to information loss. Approaches that exchange information simultaneously in both dimensions incur high computational complexity (quadratic in total length). Drawing inspiration from multi-modal fusion network architectures, a novel multi-stream Transformer architecture is proposed that fuses information from multiple input streams into a single stream and then decodes it back to multiple output streams. This multi-stream Transformer architecture significantly reduces computational complexity for real-time deployment while achieving results very close to state-of-the-art on well-established datasets. Keywords: Trajectory prediction, seq2seq model, multi-stream Transformer, Real-time.