Highly controllable human motion generation model

The combination of computer vision, artificial intelligence (AI), and natural language processing (NLP) triggers an intriguing area of automatically generating 3D human motions from text. The multidisciplinary research initiative presented in this paper sets out to realize that missing link betwe...

Full description

Saved in:
Bibliographic Details
Main Author: Huang, Jingfang
Other Authors: Liu Ziwei
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175821
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The combination of computer vision, artificial intelligence (AI), and natural language processing (NLP) triggers an intriguing area of automatically generating 3D human motions from text. The multidisciplinary research initiative presented in this paper sets out to realize that missing link between textual description and realistic, diverse synthe sis of 3D human motions, so that such knowledge-generating approaches can be applied in a transformative manner in, for example, gaming and film production. However, in spite of advancements in methods such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion models, a number of questions still arise in relation to the capability of fine-grained control and fidelity in generated motion sequences when more complex, far longer textual prompts are used. While being very powerful, these up-to-date models often work at the expense of speed or the precision and subtleties in how humans express their emotions and motion styles in the text. With this in mind, we present our new approaches: FineQuant and FineGPT, inspired by recent major advances in the field and designed to be suitable for complex motion control. FineQuant is a tool that refines the representation of motion to be able to generate complex action directly from textual descriptions, while FineGPT has model generation capabilities of a fast and fine-tuned nature. This paper helps fill the research gap through comprehensive review of literature and experimentation conducted, with indications of advances that have been achieved, and paves the way for more nuanced, controllable human motion generation from text.