Text-to-Image
text-to-motion
File size: 1,615 Bytes
6252d43
 
9064e29
9d12fb6
2c6e6d7
9064e29
 
 
6252d43
c23c907
 
 
6e0a380
c8e7e56
ad67088
1735e84
c8e7e56
 
 
 
 
 
 
 
 
 
 
732316d
 
 
 
 
40a063c
0e47e6c
 
 
 
 
 
c8e7e56
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
license: apache-2.0
tags:
- text-to-motion
- text-to-image
datasets:
- HumanML3D
- KIT-ML 
---
## Model Description

These are model weights originally provided by the authors of the paper [T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations](https://arxiv.org/abs/2301.06052).

<figure>
  <img src="https://huggingface.co/vumichien/T2M-GPT/resolve/main/T2M-GPT.png" alt="T2M-VQ">
  <figcaption>T2M-GPT
  </figcaption>
</figure>

Conditional generative framework based on Vector QuantisedVariational AutoEncoder (VQ-VAE) and Generative Pretrained Transformer (GPT) for human motion generation
from textural descriptions. 

A simple CNN-based VQ-VAE with commonly used training recipes (EMA and Code Reset) allows us to obtain high-quality discrete representations

The official code of this paper in [here](https://github.com/Mael-zys/T2M-GPT)

## Example
<figure>
  <img src="https://huggingface.co/vumichien/T2M-GPT/resolve/main/demo_slow1.gif" alt="Demo Slow", width="425", height=480/>
  <figcaption> a man starts off in an up right position with botg arms extended out by his sides, he then brings his arms down to his body and claps his hands together. after this he wals down amd the the left where he proceeds to sit on a seat
  </figcaption>
</figure>

<figure>
  <img src="https://huggingface.co/vumichien/T2M-GPT/resolve/main/demo_slow2.gif" alt="Demo Slow 2", width="425", height=480/>
  <figcaption> a person puts their hands together, leans forwards slightly then swings the arms from right to left
  </figcaption>
</figure>

## Datasets
HumanML3D and KIT-ML