PGMoGen: pose-guided human motion generation

3D human motion modeling is a fundamental component of computer animation which is essential in creating immersive or interactive virtual environments such as games, movie production and social media avatar generation. However, it remains challenging to generate natural and diverse 3D human motion....

Full description

Saved in:
Bibliographic Details
Main Author: Yin, Jiarui
Other Authors: Liu Ziwei
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/166113
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:3D human motion modeling is a fundamental component of computer animation which is essential in creating immersive or interactive virtual environments such as games, movie production and social media avatar generation. However, it remains challenging to generate natural and diverse 3D human motion. Some popular existing approaches to generate human motion include predefined motion libraries, motion capture data, and physics-based models. Applying these methods could be skills demanding, time consuming and budget draining for indi viduals. To address this problem, we propose a diffusion based pose-guided text driven human motion generation model PGMoGen. With the diffusion model, we aim to generate realistic human motion that is responsive and relevant to textual input. We demonstrate how the diffusion mode generates reasonable and assorted human motions based on textual input and evaluate the effectiveness of the diffusion model through experiments. Finally, we compared the model to the AvatarCLIP and MotionDiffuse approaches to reveal the improved results of the diffusion model. Some recent works successfully design text-driven motion synthesis pipeline based on the diffusion model, and have achieved significant progress in generation quality. However, their generalizability still heavily relies n the training dataset and will perform poorly with unseen text input. Therefore, to enlarge the application of text-driven motion generation technique, we incorporate the zero-shot ability to into the state-of-art diffusion model-based motion generation pipeline. Specifically, we establish a pose database VPoser Database to accomplish pose retrieval-based motion generation. The overall pipeline contains two novel components: a) Pose retrieval techniques in both training stage and inference stage; b) Pose-guided motion transformer. Furthermore, to increase the diversity and quantity of the training motions, we have re-labelled the HuMMan dataset and utilized it as an additional training dataset for our PGMoGen model. Extensive quantitative and qualitative experiments prove the superiority of our proposed PGMoGen.