Highly controllable human motion generation model
The combination of computer vision, artificial intelligence (AI), and natural language processing (NLP) triggers an intriguing area of automatically generating 3D human motions from text. The multidisciplinary research initiative presented in this paper sets out to realize that missing link betwe...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175821 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-175821 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1758212024-05-10T15:40:46Z Highly controllable human motion generation model Huang, Jingfang Liu Ziwei School of Computer Science and Engineering ziwei.liu@ntu.edu.sg Computer and Information Science Computer vision Natural language processing Human motion generation The combination of computer vision, artificial intelligence (AI), and natural language processing (NLP) triggers an intriguing area of automatically generating 3D human motions from text. The multidisciplinary research initiative presented in this paper sets out to realize that missing link between textual description and realistic, diverse synthe sis of 3D human motions, so that such knowledge-generating approaches can be applied in a transformative manner in, for example, gaming and film production. However, in spite of advancements in methods such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion models, a number of questions still arise in relation to the capability of fine-grained control and fidelity in generated motion sequences when more complex, far longer textual prompts are used. While being very powerful, these up-to-date models often work at the expense of speed or the precision and subtleties in how humans express their emotions and motion styles in the text. With this in mind, we present our new approaches: FineQuant and FineGPT, inspired by recent major advances in the field and designed to be suitable for complex motion control. FineQuant is a tool that refines the representation of motion to be able to generate complex action directly from textual descriptions, while FineGPT has model generation capabilities of a fast and fine-tuned nature. This paper helps fill the research gap through comprehensive review of literature and experimentation conducted, with indications of advances that have been achieved, and paves the way for more nuanced, controllable human motion generation from text. Bachelor's degree 2024-05-08T02:30:12Z 2024-05-08T02:30:12Z 2024 Final Year Project (FYP) Huang, J. (2024). Highly controllable human motion generation model. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175821 https://hdl.handle.net/10356/175821 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Computer vision Natural language processing Human motion generation |
spellingShingle |
Computer and Information Science Computer vision Natural language processing Human motion generation Huang, Jingfang Highly controllable human motion generation model |
description |
The combination of computer vision, artificial intelligence (AI), and natural language
processing (NLP) triggers an intriguing area of automatically generating 3D human
motions from text. The multidisciplinary research initiative presented in this paper sets
out to realize that missing link between textual description and realistic, diverse synthe sis of 3D human motions, so that such knowledge-generating approaches can be applied
in a transformative manner in, for example, gaming and film production. However, in
spite of advancements in methods such as Generative Adversarial Networks (GANs),
Variational Autoencoders (VAEs), and Diffusion models, a number of questions still
arise in relation to the capability of fine-grained control and fidelity in generated motion
sequences when more complex, far longer textual prompts are used. While being very
powerful, these up-to-date models often work at the expense of speed or the precision
and subtleties in how humans express their emotions and motion styles in the text.
With this in mind, we present our new approaches: FineQuant and FineGPT, inspired
by recent major advances in the field and designed to be suitable for complex motion
control. FineQuant is a tool that refines the representation of motion to be able to
generate complex action directly from textual descriptions, while FineGPT has model
generation capabilities of a fast and fine-tuned nature. This paper helps fill the research
gap through comprehensive review of literature and experimentation conducted, with
indications of advances that have been achieved, and paves the way for more nuanced,
controllable human motion generation from text. |
author2 |
Liu Ziwei |
author_facet |
Liu Ziwei Huang, Jingfang |
format |
Final Year Project |
author |
Huang, Jingfang |
author_sort |
Huang, Jingfang |
title |
Highly controllable human motion generation model |
title_short |
Highly controllable human motion generation model |
title_full |
Highly controllable human motion generation model |
title_fullStr |
Highly controllable human motion generation model |
title_full_unstemmed |
Highly controllable human motion generation model |
title_sort |
highly controllable human motion generation model |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/175821 |
_version_ |
1800916122289045504 |