YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

RoboMind VLA β€” Robot Locomotion Reward Judge

A vision-language reward model for robot locomotion quality assessment, fine-tuned from MiniCPM-V-2.6 using LoRA. Combines VLM understanding with physics-based normalization for robust scoring across 5 MuJoCo environments.

Credits

Results

Metric Score
Hybrid Spearman correlation 0.951
Rule-based Spearman 0.976
Tier separation (expert - simple) 0.371
Expert mean reward 0.915
Medium mean reward 0.717
Simple mean reward 0.544
Test battery: expert range 0.948 - 0.975
Test battery: simple range 0.025 - 0.245

Features

  • Hybrid scoring: 95% physics-based rule normalization + 5% VLM qualitative analysis
  • 5 MuJoCo environments: humanoid, walker2d, ant, hopper, halfcheetah (expert/medium/simple)
  • Sound detection: Audio-based fall detection and gait analysis via librosa
  • Web UI: FastAPI + HTML/JS interface on Modal GPU
  • Fully serverless: All computation runs on Modal (GPU/CPU)

Quick Start

pip install robomind-vla
from robomind import RoboMindJudge, hybrid_judge

# VLM-only judgment
judge = RoboMindJudge()
judge.load()
result = judge.judge_from_paths(["frame1.jpg", "frame2.jpg", "frame3.jpg"])

# Hybrid scoring (with physics data)
from robomind.hybrid import hybrid_judge, hybrid_to_dict
score = hybrid_judge(
    vlm_parsed=result,
    ep_return=8000, min_return=4000, max_return=10000,
    fell=False, tier="medium", env="walker2d",
)
print(hybrid_to_dict(score))

Project Structure

robomind/
β”œβ”€β”€ robomind/                  # Installable Python package
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ judge.py              # Core VLM judge class
β”‚   β”œβ”€β”€ hybrid.py             # Hybrid VLM + rule-based scoring
β”‚   └── sound.py              # Audio-based fall/gait detection
β”œβ”€β”€ app.py                    # FastAPI web UI (Modal deployment)
β”œβ”€β”€ hybrid_judge.py           # Standalone hybrid judge (used by app.py)
β”œβ”€β”€ data_gen_all_modal.py     # Data generation (15 env combos x 20 episodes)
β”œβ”€β”€ dataset_build_v2.py       # Dataset builder with visual analysis
β”œβ”€β”€ finetune_modal.py         # LoRA fine-tune on Modal GPU
β”œβ”€β”€ validation.py             # Validation suite on Modal GPU
β”œβ”€β”€ sound_detection.py        # Sound detection on Modal
β”œβ”€β”€ tests_comprehensive.py    # 18 unit/integration tests
β”œβ”€β”€ pyproject.toml            # Package config
└── LICENSE                   # MIT License

HF Hub Repos

Repo Description
mitvho09/robomind-rollouts 300 rollout videos + metadata
mitvho09/robomind-loco-judge-dataset 300 training samples with keyframes + judgments
mitvho09/robomind-minicpm-loco-lora LoRA adapter (rank=64, 7 modules)

Environments

Environment Expert Medium Simple
humanoid 20 eps 20 eps 20 eps
walker2d 20 eps 20 eps 20 eps
ant 20 eps 20 eps 20 eps
hopper 20 eps 20 eps 20 eps
halfcheetah 20 eps 20 eps 20 eps

How It Works

1. Data Generation (Modal CPU)

  • Downloads Minari expert/medium/simple datasets
  • Reconstructs states via set_state() (no open-loop replay)
  • Renders .mp4 videos with MuJoCo physics

2. Training (Modal GPU)

  • Extracts 6 keyframes per episode
  • Derives judgment JSON (stability, gait_quality, predicted_reward, etc.)
  • Fine-tunes MiniCPM-V-2.6 with LoRA (rank=64, alpha=128, 7 target modules)

3. Hybrid Scoring

Final Score = 0.95 * Rule_score + 0.05 * VLM_score
  • Rule score: Physics-based return normalization with per-env calibration and tier adjustments
  • VLM score: Combines stability assessment, gait quality, anomaly detection
  • Tier adjustments: expert=0, medium=-0.15, simple=-0.35

4. Sound Detection

  • Extracts audio from rollout videos
  • Detects impacts, motor strain, gait rhythm
  • Provides fall confidence score (penalizes reward when fall detected)

Running on Modal

# Generate data
modal run --detach data_gen_all_modal.py

# Build dataset
modal run dataset_build_v2.py

# Train (50 epochs, LoRA r=64)
modal run --detach finetune_modal.py::full_train

# Validate
modal run validation.py

# Deploy web UI
modal deploy app.py

Tests

python -m pytest tests_comprehensive.py -v
# 18/18 pass

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support