Retrieval-augmented human motion generation with diffusion model

Human motion generation is a crucial area of research with the potential to bring lifelike characters and movements to various applications, enhancing user engagement and immersion. However, the intricacy and diversity of human movements, the scarcity of motion data, the difficulty of incorporating...

Full description

Saved in:
Bibliographic Details
Main Author: Guo, Xinying
Other Authors: Liu Ziwei
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/167733
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Human motion generation is a crucial area of research with the potential to bring lifelike characters and movements to various applications, enhancing user engagement and immersion. However, the intricacy and diversity of human movements, the scarcity of motion data, the difficulty of incorporating human-like traits, and human’s heightened sensitivity to body movements pose persistent challenges in generating plausible human motions. The aforementioned problems have led to a surge in human motion generation model development in recent years, with text-driven motion generation being particularly popular due to its user-friendly nature. However, current text-driven generative approaches suffer from either poor quality or limitations in generalizability and expressiveness. To overcome these challenges, this project draws inspiration from successful diffusion models and retrieval techniques in related fields, and proposes ReMoDiffuse, an efficient diffusion-model-based text-driven motion generation framework complementing with a novel retrieval strategy. Specifically, ReMoDiffuse utilizes a diffusion model and integrates a multi-modality retrieval database to refine the denoising process. The results of extensive experiments demonstrate that the proposed method achieves superior performance in terms of quality, generalizability, and expressiveness.