NVisionAI
/

kimodo-api

motion-generation

Model card Files Files and versions

NVisionAI commited on 19 days ago

Commit

126cc14

·

verified ·

1 Parent(s): bfa7a8c

Add model card

Files changed (1) hide show

README.md +92 -0

README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+license: apache-2.0
+tags:
+- motion-generation
+- text-to-motion
+- human-motion
+- surveillance
+- synthetic-data
+- docker
+- rest-api
+- kimodo
+- nvidia
+pipeline_tag: text-to-video
+---
+# kimodo-api 🏃
+A **REST API wrapper** around [NVIDIA Kimodo](https://github.com/nv-tlabs/kimodo) — the state-of-the-art text-to-motion diffusion model trained on 700 hours of commercial mocap data.
+This image turns Kimodo into a microservice you can call from any pipeline, no Python environment needed.
+## Quick Start
+```bash
+docker pull ghcr.io/eyalenav/kimodo-api:latest
+docker run --rm --gpus '"device=0"' -p 9551:9551 \
+  -v ~/.cache/huggingface:/root/.cache/huggingface \
+  -e HUGGINGFACE_TOKEN=hf_... \
+  ghcr.io/eyalenav/kimodo-api:latest
+```
+> ⚠️ First run downloads Llama-3-8B-Instruct (~16GB) for the text encoder. Requires a HuggingFace token with access to `meta-llama/Meta-Llama-3-8B-Instruct`.
+## API
+### `POST /generate`
+Generate a motion clip from a text prompt.
+```bash
+curl -X POST http://localhost:9551/generate \
+  -H "Content-Type: application/json" \
+  -d '{"prompt": "person pushing through a crowd aggressively"}'
+```
+**Response:** NPZ file (binary) — SOMA 77-joint skeleton format, compatible with BVH export.
+### `GET /health`
+```bash
+curl http://localhost:9551/health
+# {"status": "ok"}
+```
+## Requirements
+| Resource | Minimum |
+|---|---|
+| GPU | RTX 3090 / A100 / RTX 6000 Ada |
+| VRAM | 24 GB |
+| RAM | 32 GB |
+| Disk | 50 GB (model weights) |
+## What's inside
+- **Kimodo** — NVIDIA's kinematic motion diffusion model (77-joint SOMA skeleton)
+- **LLM2Vec** text encoder backed by **Llama-3-8B-Instruct**
+- **FastAPI** server on port 9551
+- Health check + graceful startup
+## Part of VisionAI-Flywheel
+This service is one component of a full synthetic surveillance data pipeline:
+```
+[kimodo-api] → NPZ motion
+    ↓
+[render-api] → SOMA mesh render (MP4)
+    ↓
+[cosmos-transfer] → Sim2Real photorealistic video
+    ↓
+[NVIDIA VSS] → VLM annotation → fine-tuning dataset
+```
+🔗 Full pipeline: [github.com/EyalEnav/VisionAI-Flywheel](https://github.com/EyalEnav/VisionAI-Flywheel)
+## License
+Apache 2.0 — see [LICENSE](https://github.com/EyalEnav/VisionAI-Flywheel/blob/main/LICENSE)
+> Kimodo model weights are released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) and downloaded at runtime. They are not bundled in this image.