RoboMind VLA — Robot Locomotion Reward Judge

A vision-language reward model for robot locomotion quality assessment, fine-tuned from MiniCPM-V-2.6 using LoRA. Combines VLM understanding with physics-based normalization for robust scoring across 5 MuJoCo environments.

Credits

Built with: OpenAI Codex — code generation, architecture design, debugging, and iteration throughout the entire project
Base model: MiniCPM-V-2.6 by OpenBMB
Training framework: Hugging Face Transformers + PEFT (LoRA)
Infrastructure: Modal (serverless GPU/CPU)
Physics engine: MuJoCo via Minari datasets
Audio analysis: librosa

Results

Metric	Score
Hybrid Spearman correlation	0.951
Rule-based Spearman	0.976
Tier separation (expert - simple)	0.371
Expert mean reward	0.915
Medium mean reward	0.717
Simple mean reward	0.544
Test battery: expert range	0.948 - 0.975
Test battery: simple range	0.025 - 0.245

Features

Hybrid scoring: 95% physics-based rule normalization + 5% VLM qualitative analysis
5 MuJoCo environments: humanoid, walker2d, ant, hopper, halfcheetah (expert/medium/simple)
Sound detection: Audio-based fall detection and gait analysis via librosa
Web UI: FastAPI + HTML/JS interface on Modal GPU
Fully serverless: All computation runs on Modal (GPU/CPU)

Quick Start

pip install robomind-vla

from robomind import RoboMindJudge, hybrid_judge

# VLM-only judgment
judge = RoboMindJudge()
judge.load()
result = judge.judge_from_paths(["frame1.jpg", "frame2.jpg", "frame3.jpg"])

# Hybrid scoring (with physics data)
from robomind.hybrid import hybrid_judge, hybrid_to_dict
score = hybrid_judge(
    vlm_parsed=result,
    ep_return=8000, min_return=4000, max_return=10000,
    fell=False, tier="medium", env="walker2d",
)
print(hybrid_to_dict(score))

Project Structure

robomind/
├── robomind/                  # Installable Python package
│   ├── __init__.py
│   ├── judge.py              # Core VLM judge class
│   ├── hybrid.py             # Hybrid VLM + rule-based scoring
│   └── sound.py              # Audio-based fall/gait detection
├── app.py                    # FastAPI web UI (Modal deployment)
├── hybrid_judge.py           # Standalone hybrid judge (used by app.py)
├── data_gen_all_modal.py     # Data generation (15 env combos x 20 episodes)
├── dataset_build_v2.py       # Dataset builder with visual analysis
├── finetune_modal.py         # LoRA fine-tune on Modal GPU
├── validation.py             # Validation suite on Modal GPU
├── sound_detection.py        # Sound detection on Modal
├── tests_comprehensive.py    # 18 unit/integration tests
├── pyproject.toml            # Package config
└── LICENSE                   # MIT License

HF Hub Repos

Repo	Description
mitvho09/robomind-rollouts	300 rollout videos + metadata
mitvho09/robomind-loco-judge-dataset	300 training samples with keyframes + judgments
mitvho09/robomind-minicpm-loco-lora	LoRA adapter (rank=64, 7 modules)

Environments

Environment	Expert	Medium	Simple
humanoid	20 eps	20 eps	20 eps
walker2d	20 eps	20 eps	20 eps
ant	20 eps	20 eps	20 eps
hopper	20 eps	20 eps	20 eps
halfcheetah	20 eps	20 eps	20 eps

How It Works

1. Data Generation (Modal CPU)

Downloads Minari expert/medium/simple datasets
Reconstructs states via set_state() (no open-loop replay)
Renders .mp4 videos with MuJoCo physics

2. Training (Modal GPU)

Extracts 6 keyframes per episode
Derives judgment JSON (stability, gait_quality, predicted_reward, etc.)
Fine-tunes MiniCPM-V-2.6 with LoRA (rank=64, alpha=128, 7 target modules)

3. Hybrid Scoring

Final Score = 0.95 * Rule_score + 0.05 * VLM_score

Rule score: Physics-based return normalization with per-env calibration and tier adjustments
VLM score: Combines stability assessment, gait quality, anomaly detection
Tier adjustments: expert=0, medium=-0.15, simple=-0.35

4. Sound Detection

Extracts audio from rollout videos
Detects impacts, motor strain, gait rhythm
Provides fall confidence score (penalizes reward when fall detected)

Running on Modal

# Generate data
modal run --detach data_gen_all_modal.py

# Build dataset
modal run dataset_build_v2.py

# Train (50 epochs, LoRA r=64)
modal run --detach finetune_modal.py::full_train

# Validate
modal run validation.py

# Deploy web UI
modal deploy app.py

Tests

python -m pytest tests_comprehensive.py -v
# 18/18 pass

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support