Highly controllable human motion generation model

The combination of computer vision, artificial intelligence (AI), and natural language processing (NLP) triggers an intriguing area of automatically generating 3D human motions from text. The multidisciplinary research initiative presented in this paper sets out to realize that missing link betwe...

Full description

Saved in:
Bibliographic Details
Main Author: Huang, Jingfang
Other Authors: Liu Ziwei
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175821
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-175821
record_format dspace
spelling sg-ntu-dr.10356-1758212024-05-10T15:40:46Z Highly controllable human motion generation model Huang, Jingfang Liu Ziwei School of Computer Science and Engineering ziwei.liu@ntu.edu.sg Computer and Information Science Computer vision Natural language processing Human motion generation The combination of computer vision, artificial intelligence (AI), and natural language processing (NLP) triggers an intriguing area of automatically generating 3D human motions from text. The multidisciplinary research initiative presented in this paper sets out to realize that missing link between textual description and realistic, diverse synthe sis of 3D human motions, so that such knowledge-generating approaches can be applied in a transformative manner in, for example, gaming and film production. However, in spite of advancements in methods such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion models, a number of questions still arise in relation to the capability of fine-grained control and fidelity in generated motion sequences when more complex, far longer textual prompts are used. While being very powerful, these up-to-date models often work at the expense of speed or the precision and subtleties in how humans express their emotions and motion styles in the text. With this in mind, we present our new approaches: FineQuant and FineGPT, inspired by recent major advances in the field and designed to be suitable for complex motion control. FineQuant is a tool that refines the representation of motion to be able to generate complex action directly from textual descriptions, while FineGPT has model generation capabilities of a fast and fine-tuned nature. This paper helps fill the research gap through comprehensive review of literature and experimentation conducted, with indications of advances that have been achieved, and paves the way for more nuanced, controllable human motion generation from text. Bachelor's degree 2024-05-08T02:30:12Z 2024-05-08T02:30:12Z 2024 Final Year Project (FYP) Huang, J. (2024). Highly controllable human motion generation model. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175821 https://hdl.handle.net/10356/175821 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Computer vision
Natural language processing
Human motion generation
spellingShingle Computer and Information Science
Computer vision
Natural language processing
Human motion generation
Huang, Jingfang
Highly controllable human motion generation model
description The combination of computer vision, artificial intelligence (AI), and natural language processing (NLP) triggers an intriguing area of automatically generating 3D human motions from text. The multidisciplinary research initiative presented in this paper sets out to realize that missing link between textual description and realistic, diverse synthe sis of 3D human motions, so that such knowledge-generating approaches can be applied in a transformative manner in, for example, gaming and film production. However, in spite of advancements in methods such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion models, a number of questions still arise in relation to the capability of fine-grained control and fidelity in generated motion sequences when more complex, far longer textual prompts are used. While being very powerful, these up-to-date models often work at the expense of speed or the precision and subtleties in how humans express their emotions and motion styles in the text. With this in mind, we present our new approaches: FineQuant and FineGPT, inspired by recent major advances in the field and designed to be suitable for complex motion control. FineQuant is a tool that refines the representation of motion to be able to generate complex action directly from textual descriptions, while FineGPT has model generation capabilities of a fast and fine-tuned nature. This paper helps fill the research gap through comprehensive review of literature and experimentation conducted, with indications of advances that have been achieved, and paves the way for more nuanced, controllable human motion generation from text.
author2 Liu Ziwei
author_facet Liu Ziwei
Huang, Jingfang
format Final Year Project
author Huang, Jingfang
author_sort Huang, Jingfang
title Highly controllable human motion generation model
title_short Highly controllable human motion generation model
title_full Highly controllable human motion generation model
title_fullStr Highly controllable human motion generation model
title_full_unstemmed Highly controllable human motion generation model
title_sort highly controllable human motion generation model
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/175821
_version_ 1800916122289045504