Add model card
Browse files
README.md
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- motion-generation
|
| 5 |
+
- text-to-motion
|
| 6 |
+
- human-motion
|
| 7 |
+
- surveillance
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- docker
|
| 10 |
+
- rest-api
|
| 11 |
+
- kimodo
|
| 12 |
+
- nvidia
|
| 13 |
+
pipeline_tag: text-to-video
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# kimodo-api π
|
| 17 |
+
|
| 18 |
+
A **REST API wrapper** around [NVIDIA Kimodo](https://github.com/nv-tlabs/kimodo) β the state-of-the-art text-to-motion diffusion model trained on 700 hours of commercial mocap data.
|
| 19 |
+
|
| 20 |
+
This image turns Kimodo into a microservice you can call from any pipeline, no Python environment needed.
|
| 21 |
+
|
| 22 |
+
## Quick Start
|
| 23 |
+
|
| 24 |
+
```bash
|
| 25 |
+
docker pull ghcr.io/eyalenav/kimodo-api:latest
|
| 26 |
+
|
| 27 |
+
docker run --rm --gpus '"device=0"' -p 9551:9551 \
|
| 28 |
+
-v ~/.cache/huggingface:/root/.cache/huggingface \
|
| 29 |
+
-e HUGGINGFACE_TOKEN=hf_... \
|
| 30 |
+
ghcr.io/eyalenav/kimodo-api:latest
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
> β οΈ First run downloads Llama-3-8B-Instruct (~16GB) for the text encoder. Requires a HuggingFace token with access to `meta-llama/Meta-Llama-3-8B-Instruct`.
|
| 34 |
+
|
| 35 |
+
## API
|
| 36 |
+
|
| 37 |
+
### `POST /generate`
|
| 38 |
+
|
| 39 |
+
Generate a motion clip from a text prompt.
|
| 40 |
+
|
| 41 |
+
```bash
|
| 42 |
+
curl -X POST http://localhost:9551/generate \
|
| 43 |
+
-H "Content-Type: application/json" \
|
| 44 |
+
-d '{"prompt": "person pushing through a crowd aggressively"}'
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
**Response:** NPZ file (binary) β SOMA 77-joint skeleton format, compatible with BVH export.
|
| 48 |
+
|
| 49 |
+
### `GET /health`
|
| 50 |
+
|
| 51 |
+
```bash
|
| 52 |
+
curl http://localhost:9551/health
|
| 53 |
+
# {"status": "ok"}
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
## Requirements
|
| 57 |
+
|
| 58 |
+
| Resource | Minimum |
|
| 59 |
+
|---|---|
|
| 60 |
+
| GPU | RTX 3090 / A100 / RTX 6000 Ada |
|
| 61 |
+
| VRAM | 24 GB |
|
| 62 |
+
| RAM | 32 GB |
|
| 63 |
+
| Disk | 50 GB (model weights) |
|
| 64 |
+
|
| 65 |
+
## What's inside
|
| 66 |
+
|
| 67 |
+
- **Kimodo** β NVIDIA's kinematic motion diffusion model (77-joint SOMA skeleton)
|
| 68 |
+
- **LLM2Vec** text encoder backed by **Llama-3-8B-Instruct**
|
| 69 |
+
- **FastAPI** server on port 9551
|
| 70 |
+
- Health check + graceful startup
|
| 71 |
+
|
| 72 |
+
## Part of VisionAI-Flywheel
|
| 73 |
+
|
| 74 |
+
This service is one component of a full synthetic surveillance data pipeline:
|
| 75 |
+
|
| 76 |
+
```
|
| 77 |
+
[kimodo-api] β NPZ motion
|
| 78 |
+
β
|
| 79 |
+
[render-api] β SOMA mesh render (MP4)
|
| 80 |
+
β
|
| 81 |
+
[cosmos-transfer] β Sim2Real photorealistic video
|
| 82 |
+
β
|
| 83 |
+
[NVIDIA VSS] β VLM annotation β fine-tuning dataset
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
π Full pipeline: [github.com/EyalEnav/VisionAI-Flywheel](https://github.com/EyalEnav/VisionAI-Flywheel)
|
| 87 |
+
|
| 88 |
+
## License
|
| 89 |
+
|
| 90 |
+
Apache 2.0 β see [LICENSE](https://github.com/EyalEnav/VisionAI-Flywheel/blob/main/LICENSE)
|
| 91 |
+
|
| 92 |
+
> Kimodo model weights are released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) and downloaded at runtime. They are not bundled in this image.
|