Sequence-to-sequence learning for motion prediction and generation

The research field for computational understanding and modelling of human motion has garnered increasing importance in the last decade, with a plethora of applications in sports science, animation, robotics, surveillance and autonomous driving. In this thesis, we engage the sequence-to-sequence lear...

Full description

Saved in:
Bibliographic Details
Main Author: Wu, Shuang
Other Authors: Lu Shijian
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/159102
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The research field for computational understanding and modelling of human motion has garnered increasing importance in the last decade, with a plethora of applications in sports science, animation, robotics, surveillance and autonomous driving. In this thesis, we engage the sequence-to-sequence learning paradigm to study motion prediction and motion generation. We first examine multiple articulated pose representation schemes for integrating biomechanical constraints within computational motion models. Our theoretical analysis and empirical studies suggest that the kinematic tree representation with Stiefel manifold parametrizations is most suitable. In motion prediction, we seek to generate future motion given an observed sequence. To handle long-term dependency, we design a hierarchical recurrent network to simultaneously model local contexts and global characteristics. This attains better short-term accuracy along with natural motion predictions in the long-term. On another front, we look to incorporate control into our prediction models. We employ multiple generative adversarial networks to model individual body parts, allowing for fine-grained control and tuning of the prediction spectrum. Finally, we reconsider motion prediction within the framework of stochastic differential equations, which allows for interpretation of model weights as the stochastic diffusion matrix and drift parameters. For motion generation, we specifically study generating dance motion conditioned on music input. We introduce an optimal transport objective for evaluating the authenticity of generated dance distributions and a Gromov-Wasserstein objective to match dance with music. These objectives allow our model to synthesize realistic dance motion in harmony with the input music. Furthermore, we consider a dual learning framework to concurrently learn both music-to-dance and dance-to-music generation. Effectively integrating the information from both domains, dual learning boosts the performance of individual tasks, delivering realistic genre-consistent dance generations and viable music compositions.