Edit model card

Model Description

TRME (Text Residual Motion Encoder) significantly enhances the process of generating 3D human motion from textual descriptions. It extends the Vector Quantized Variational Autoencoder (VQ-VAE) architecture mentioned in T2M-GPT by integrating additional residual blocks. This innovation enables the capture of finer motion details, facilitating the synthesis of more diverse and realistic human motions. This model is developed to assist in animation and virtual reality industries, among others.

Key Features:

  • Utilizes enhanced VQ-VAE architecture.
  • Capable of detailed and complex motion synthesis.
  • Optimized for high diversity and realism in generated motions.

![Model Visualization]

TRME
Generated Sequences for MOYO and HumanML3D from T2M-GPT and TRME. The figure provides detailed visualizations of motion sequences generated by our models. The generated motions correspond to two different captions extracted from the HumanML3D and MOYO datasets. Notably, TRME demonstrates superior performance in capturing dependencies across diverse motion classes compared to state-of-the-art models.

The model has been trained on a novel dataset, CHAD, which includes a comprehensive set of human motion data, enabling it to handle a wide variety of motion generation tasks.

Learn more about the training process and model architecture below:

overview
Data flow diagram for the TRME model, highlighting the progression from the AMASS database to the creation of the CHAD dataset and subsequent motion generation

Example Usage

Here is a demonstration of the model generating a complex human motion from a simple textual description:

illustration
A person is doing a tree pose.

Datasets

This model has been trained on the following datasets:

  • CHAD (Comprehensive Human Activity Dataset): An aggregation of motion capture data tailored to diverse activities, enhancing training and model robustness.
  • HumanML3D: Provides diverse scenarios from daily activities to complex sports movements.
  • AMASS: A large-scale motion capture dataset that provides an extensive set of human movements and poses.

Citation

@misc {vumichien_2023,
    author       = { {vumichien} },
    title        = { T2M-GPT (Revision e311a99) },
    year         = 2023,
    url          = { https://huggingface.co/vumichien/T2M-GPT },
    doi          = { 10.57967/hf/0341 },
    publisher    = { Hugging Face }
}

Acknowledgements

We would like to thank the contributors and researchers in 3D Human Motion Generation domain who provided insights and datasets which were invaluable in developing the TRME model. Special thanks to Dr. Youshan Zhang (Assistant Professor at Yeshiva University, NYC) for his guidance and expertise throughout the project.

For more information, contributions, or questions, please visit our project repository.

Downloads last month
0
Unable to determine this model's library. Check the docs .