cosmos-transfer / API.md
NVisionAI's picture
Add standalone API documentation
65d140c verified

cosmos-transfer

REST API microservice wrapper around NVIDIA Cosmos-Transfer2.5-2B β€” a video diffusion model that converts synthetic renders into photorealistic video (Sim2Real).


Installation

docker pull ghcr.io/eyalenav/cosmos-transfer:latest

Run

docker run --rm \
  --gpus '"device=0"' \
  -p 8080:8080 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -e HUGGINGFACE_TOKEN=hf_... \
  ghcr.io/eyalenav/cosmos-transfer:latest

First run: downloads Cosmos-Transfer2.5-2B weights (~20 GB). Subsequent starts are fast.


API Reference

GET /health

Check server status.

Request

GET http://localhost:8080/health

Response

{
  "status": "ok",
  "model": "Cosmos-Transfer2.5-2B",
  "device": "cuda:0"
}

POST /transfer

Convert a synthetic video to photorealistic using multicontrol (edge + visual).

Request

POST http://localhost:8080/transfer
Content-Type: multipart/form-data
Field Type Default Description
video file required Input synthetic MP4 (max 10s @ 24fps recommended)
prompt string "" Text describing the scene (improves realism)
edge_strength float 0.85 Canny edge control strength (geometry preservation)
vis_strength float 0.45 Visual/blur control strength (scene structure)
sigma int 100 Noise level β€” lower = more faithful, higher = more realistic
num_steps int 35 Diffusion steps (more = slower but higher quality)
seed int -1 Random seed (-1 = random)

Response

Binary MP4 file (video/mp4).

Example

curl -X POST http://localhost:8080/transfer \
  -F "video=@synthetic_render.mp4" \
  -F "prompt=surveillance camera footage of a crowded urban street, overcast day" \
  -F "edge_strength=0.85" \
  -F "vis_strength=0.45" \
  -F "sigma=100" \
  --output photorealistic.mp4

POST /transfer_async

Submit a job and poll for completion (recommended for long clips).

Submit

curl -X POST http://localhost:8080/transfer_async \
  -F "video=@render.mp4" \
  -F "prompt=security incident, parking lot" \
  -F "edge_strength=0.85" \
  --output job.json
# {"job_id": "abc123", "status": "queued"}

Poll

curl http://localhost:8080/status/abc123
# {"job_id": "abc123", "status": "running", "progress": 0.42}
# ...
# {"job_id": "abc123", "status": "done"}

Download

curl http://localhost:8080/result/abc123 --output photorealistic.mp4

Tuned Parameters

Tested across 80+ surveillance clips β€” confirmed sweet spot:

edge_strength=0.85 + vis_strength=0.45 + sigma=100
Parameter Value Effect
edge_strength 0.85 Strong silhouette/geometry preservation from Canny edges
vis_strength 0.45 Moderate scene structure via visual blur control
sigma 100 Balanced noise β€” realistic textures without losing layout

When to adjust

Scenario Adjustment
Subject drifts from synthetic pose Increase edge_strength β†’ 0.90–0.95
Background too synthetic-looking Increase vis_strength β†’ 0.55–0.65
Output too faithful to render colors Increase sigma β†’ 120
Too much motion blur Decrease sigma β†’ 80

Hardware Requirements

Resource Minimum Recommended
GPU A100 40GB / RTX 6000 Ada H100 / RTX PRO 6000 Blackwell
VRAM 40 GB 48+ GB
RAM 64 GB 128 GB
Disk 30 GB 50 GB
CUDA 12.1+ 12.8

Processing time (RTX PRO 6000 Blackwell, 96GB VRAM):

  • 4s clip @ 24fps β†’ ~3 min
  • 10s clip @ 24fps β†’ ~7 min

Environment Variables

Variable Required Description
HUGGINGFACE_TOKEN Yes HF token with access to nvidia/Cosmos-Transfer2.5-2B
CUDA_VISIBLE_DEVICES No Limit to specific GPU (e.g. "1")
PORT No Override default port 8080

Integration with VisionAI-Flywheel

# docker-compose.yml excerpt
services:
  cosmos-transfer:
    image: ghcr.io/eyalenav/cosmos-transfer:latest
    ports:
      - "8080:8080"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["1"]
              capabilities: [gpu]
    volumes:
      - hf_cache:/root/.cache/huggingface
    environment:
      - HUGGINGFACE_TOKEN=${HUGGINGFACE_TOKEN}

Full docker-compose.yml: github.com/EyalEnav/VisionAI-Flywheel


Example: Full Python client

import requests
import time

def transfer_video(
    input_path: str,
    output_path: str,
    prompt: str = "",
    edge_strength: float = 0.85,
    vis_strength: float = 0.45,
    sigma: int = 100
):
    """Convert synthetic video to photorealistic."""
    with open(input_path, "rb") as f:
        response = requests.post(
            "http://localhost:8080/transfer",
            files={"video": ("input.mp4", f, "video/mp4")},
            data={
                "prompt": prompt,
                "edge_strength": edge_strength,
                "vis_strength": vis_strength,
                "sigma": sigma,
            },
            timeout=600
        )
    response.raise_for_status()
    
    with open(output_path, "wb") as f:
        f.write(response.content)
    print(f"Saved to {output_path}")

# Example usage
transfer_video(
    input_path="soma_render.mp4",
    output_path="photorealistic.mp4",
    prompt="surveillance camera, urban street, daytime, overcast sky"
)

License

Apache 2.0

Cosmos-Transfer2.5 model weights are released under the NVIDIA Open Model License. Weights are downloaded at runtime and are not bundled in this image.