Spaces:

jbilcke-hf
/

train-robots-with-mujoco

Paused

App Files Files Community

train-robots-with-mujoco / CLAUDE.md

Julian Bilcke

better default config

9ebdc51 13 days ago

preview code

raw

history blame contribute delete

14.3 kB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Repository Overview

This is a Hugging Face Space that provides a GPU-accelerated JupyterLab environment for training and simulating robots using the MuJoCo physics engine. The space covers a wide range of robotics applications including locomotion, manipulation, motion tracking, and general physics simulation. It is designed to run in a Docker container with NVIDIA GPU support for hardware-accelerated physics rendering.

What This Environment Supports

This is a general-purpose MuJoCo training environment with sample notebooks covering:

General MuJoCo Physics (tutorial.ipynb) - Comprehensive introduction to MuJoCo fundamentals including basic rendering, simulation loops, contacts, friction, tendons, actuators, sensors, and advanced rendering techniques
Locomotion (locomotion.ipynb) - Training quadrupedal and bipedal robots for walking, running, and acrobatic behaviors. Includes environments for Unitree Go1/G1, Boston Dynamics Spot, Google Barkour, Berkeley Humanoid, Unitree H1, and more
Manipulation (manipulation.ipynb) - Robot arm and dexterous hand control. Includes Franka Emika Panda pick-and-place tasks and Leap Hand dexterous manipulation with asymmetric actor-critic training
Motion Tracking (opentrack.ipynb) - Humanoid motion tracking and retargeting using the OpenTrack system with motion capture data

Architecture

Container Environment

Base Image: nvidia/cuda:12.8.1-devel-ubuntu22.04
Python: 3.13 (Miniconda)
GPU Rendering: Uses EGL (OpenGL for headless rendering) with NVIDIA drivers
Web Server: JupyterLab on port 7860

Key Components

GPU Initialization (init_gpu.py): Validates GPU setup before starting JupyterLab
- Checks NVIDIA driver accessibility via nvidia-smi
- Verifies EGL library availability (libEGL.so.1, libGL.so.1, libEGL_nvidia.so.0)
- Tests EGL device initialization with multiple fallback methods (platform device, default display, surfaceless)
- Validates MuJoCo rendering at multiple resolutions (64x64, 240x320, 480x640)
- Critical environment variables: MUJOCO_GL=egl, PYOPENGL_PLATFORM=egl, EGL_PLATFORM=surfaceless
MuJoCo Playground Setup (init_mujoco.py): Downloads MuJoCo model assets
- Imports mujoco_playground which automatically clones the mujoco_menagerie repository
- This repository contains robot models (quadrupeds, bipeds, arms, hands, etc.)
Server Startup (start_server.sh): Container entrypoint
- Sets up NVIDIA EGL library symlinks at runtime (searches /usr/local/nvidia/lib64, /usr/local/cuda/lib64, /usr/lib/nvidia)
- Runs GPU validation (python init_gpu.py)
- Downloads MuJoCo assets (python init_mujoco.py)
- Disables JupyterLab announcements
- Launches JupyterLab with iframe embedding support for Hugging Face Spaces

Sample Notebooks

Sample notebooks are organized in individual folders within samples/ and are automatically copied to /data/workspaces/ at container startup:

samples/tutorial/ - Complete MuJoCo introduction (2258 lines) covering physics fundamentals, rendering, contacts, actuators, sensors, tendons, and camera control
samples/locomotion/ - Quadrupedal and bipedal locomotion training (1762 lines) with PPO, domain randomization, curriculum learning, and policy fine-tuning
samples/manipulation/ - Robot manipulation (649 lines) including pick-and-place (Panda arm) and dexterous manipulation (Leap Hand) with asymmetric actor-critic
samples/opentrack/ - Humanoid motion tracking/retargeting (603 lines) including dataset download, training, checkpoint conversion, and video generation

Each sample is copied to its own workspace directory (/data/workspaces/<sample_name>/) at runtime. Notebooks are only copied if they don't already exist, preserving any user modifications.

Development Commands

Running Locally with Docker

# Build the container
docker build -t mujoco-training .

# Run with GPU support
docker run --gpus all -p 7860:7860 mujoco-training

Testing GPU Setup

# Validate GPU rendering capabilities (run inside container)
python init_gpu.py

# Check NVIDIA driver
nvidia-smi

# Test EGL libraries
ldconfig -p | grep EGL

JupyterLab Access

Default port: 7860
Default token: "huggingface" (set via JUPYTER_TOKEN environment variable)
Default landing page: /lab/tree/workspaces/locomotion/locomotion.ipynb
Notebook working directory: /data (when deployed as Hugging Face Space)

Persistent Storage and Workspaces

When deployed on Hugging Face Spaces, the /data directory is backed by persistent storage. At container startup, start_server.sh automatically:

Creates /data/workspaces/ if it doesn't exist
For each sample in samples/, creates /data/workspaces/<sample_name>/ if it doesn't exist
Copies the .ipynb file only if it doesn't already exist in the workspace (preserving user modifications)
Copies any additional files from the sample directory (datasets, scripts, etc.)

This ensures:

User modifications to notebooks are preserved across container restarts
Each sample has its own isolated workspace for generated data, models, and outputs
Sample notebooks can include supporting files that are copied to the workspace
Users can create additional workspaces in /data/workspaces/ for their own projects

Critical EGL Configuration

The container requires specific EGL configuration for headless GPU rendering:

NVIDIA EGL Vendor Config: Created at /usr/share/glvnd/egl_vendor.d/10_nvidia.json pointing to libEGL_nvidia.so.0
Library Path: LD_LIBRARY_PATH includes /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu:/usr/local/cuda/lib64
Runtime Symlinks: start_server.sh creates symlinks to libEGL_nvidia.so.0 from mounted NVIDIA directories
Environment Variables: __EGL_VENDOR_LIBRARY_DIRS=/usr/share/glvnd/egl_vendor.d

Troubleshooting EGL Issues

If MuJoCo rendering fails:

Verify NVIDIA drivers: nvidia-smi should show GPU info
Check EGL vendor config: cat /usr/share/glvnd/egl_vendor.d/10_nvidia.json
Verify library loading: ldconfig -p | grep EGL
Run comprehensive diagnostic: python init_gpu.py
Check that MUJOCO_GL=egl is set: echo $MUJOCO_GL

Training Workflows

General MuJoCo Simulation (tutorial.ipynb)

Basic simulation loop:

import mujoco
model = mujoco.MjModel.from_xml_string(xml)
data = mujoco.MjData(model)

# Simulation loop
mujoco.mj_resetData(model, data)
while data.time < duration:
    mujoco.mj_step(model, data)
    # Read sensors, apply controls, etc.

Rendering:

with mujoco.Renderer(model, height, width) as renderer:
    mujoco.mj_forward(model, data)
    renderer.update_scene(data, camera="camera_name")
    pixels = renderer.render()

Locomotion Training (locomotion.ipynb)

Typical workflow using Brax + MuJoCo Playground:

Load environment: env = registry.load(env_name)
Get config: env_cfg = registry.get_default_config(env_name)
Configure PPO: ppo_params = locomotion_params.brax_ppo_config(env_name)
Apply domain randomization: randomizer = registry.get_domain_randomizer(env_name)
Train: Use brax.training.agents.ppo.train with the environment and randomization function
Save checkpoints: Policies saved to checkpoints/{env_name}/{step}/
Fine-tune: Restore from checkpoint and continue training with modified config

Available environments:

Quadrupedal: Go1JoystickFlatTerrain, Go1JoystickRoughTerrain, Go1Getup, Go1Handstand, Go1Footstand, SpotFlatTerrainJoystick, SpotGetup, SpotJoystickGaitTracking, BarkourJoystick
Bipedal: BerkeleyHumanoidJoystickFlatTerrain, BerkeleyHumanoidJoystickRoughTerrain, G1JoystickFlatTerrain, G1JoystickRoughTerrain, H1InplaceGaitTracking, H1JoystickGaitTracking, Op3Joystick, T1JoystickFlatTerrain, T1JoystickRoughTerrain

Full list: registry.locomotion.ALL_ENVS

Key training techniques:

Domain Randomization: Randomizes friction, armature, center of mass, link masses for sim-to-real transfer
Energy Penalties: energy_termination_threshold, reward_config.energy, reward_config.dof_acc to control power consumption and smoothness
Curriculum Learning: Fine-tune from checkpoints with progressively modified reward configs
Asymmetric Actor-Critic: Actor receives proprioception, critic receives privileged simulation state

Manipulation Training (manipulation.ipynb)

Similar to locomotion but focuses on:

Pick-and-place tasks: PandaPickCubeOrientation (trains in ~3 minutes on RTX 4090)
Dexterous manipulation: LeapCubeReorient (trains in ~33 minutes on RTX 4090)
Asymmetric observations: Use policy_obs_key and value_obs_key in PPO params to train actor on sensor-like data while critic gets privileged state

Available environments: registry.manipulation.ALL_ENVS

Motion Tracking (opentrack.ipynb)

OpenTrack workflow for humanoid motion tracking:

Clone repository: git clone https://github.com/GalaxyGeneralRobotics/OpenTrack.git
Download mocap data: From huggingface.co/datasets/robfiras/loco-mujoco-datasets (Lafan1/UnitreeG1)
Train policy: python train_policy.py --exp_name debug --terrain_type flat_terrain
Convert checkpoint: python brax2torch.py --exp_name <exp_name> (Brax → PyTorch)
Generate videos: python play_policy.py --exp_name <exp_name> --use_renderer

Python Dependencies

Core stack (see requirements.txt):

JupyterLab: 4.4.3 (with tornado 6.2 for compatibility)
JAX: CUDA 12 support via jax[cuda12]
MuJoCo: 3.3+ with MuJoCo MJX (JAX-based physics)
Brax: JAX-based RL framework for massively parallel training
MuJoCo Playground: Collection of robot environments and training utilities
Supporting libraries: mediapy (video rendering), ipywidgets, nvidia-cusparse-cu12

File Structure

/
├── Dockerfile                      # Container with CUDA 12.8 + EGL setup
├── start_server.sh                 # Container entrypoint
├── init_gpu.py                     # GPU validation script (comprehensive EGL tests)
├── init_mujoco.py                  # MuJoCo Playground asset downloader
├── requirements.txt                # Python dependencies
├── packages.txt                    # System packages (currently empty)
├── on_startup.sh                   # Custom startup commands (placeholder)
├── login.html                      # Custom JupyterLab login page
└── samples/                        # Example notebooks (organized by topic)
    ├── tutorial/
    │   └── tutorial.ipynb          # MuJoCo fundamentals (2258 lines)
    ├── locomotion/
    │   └── locomotion.ipynb        # Robot locomotion (1762 lines)
    ├── manipulation/
    │   └── manipulation.ipynb      # Robot manipulation (649 lines)
    └── opentrack/
        └── opentrack.ipynb         # Motion tracking (603 lines)

When deployed as a Hugging Face Space with persistent storage:

/data/                              # Persistent storage volume (mounted at runtime)
└── workspaces/                     # Sample workspaces (created by start_server.sh)
    ├── tutorial/
    │   ├── tutorial.ipynb          # Copied from samples/, preserves user edits
    │   └── ...                     # User-generated data, models, outputs
    ├── locomotion/
    │   ├── locomotion.ipynb
    │   ├── checkpoints/            # Training checkpoints
    │   └── ...
    ├── manipulation/
    │   ├── manipulation.ipynb
    │   └── ...
    └── opentrack/
        ├── opentrack.ipynb
        ├── datasets/               # Downloaded mocap data
        ├── models/                 # Trained models
        └── videos/                 # Generated videos

Performance Notes

Physics simulation: Can achieve 50,000+ Hz on single GPU with JAX/MJX (much faster than rendering)
Rendering: Typically 30-60 Hz, much slower than physics
Training times (on RTX 4090 / L40S):
- Simple manipulation: 3 minutes
- Quadrupedal joystick: 7 minutes
- Bipedal locomotion: 17 minutes
- Dexterous manipulation: 33 minutes
Brax parallelization: Uses thousands of parallel environments for fast training
Checkpointing: Critical for curriculum learning and fine-tuning

Common Patterns

Visualization Options

scene_option = mujoco.MjvOption()
scene_option.flags[mujoco.mjtVisFlag.mjVIS_JOINT] = True           # Show joints
scene_option.flags[mujoco.mjtVisFlag.mjVIS_CONTACTPOINT] = True   # Show contacts
scene_option.flags[mujoco.mjtVisFlag.mjVIS_CONTACTFORCE] = True   # Show forces
scene_option.flags[mujoco.mjtVisFlag.mjVIS_TRANSPARENT] = True    # Transparency
scene_option.flags[mujoco.mjtVisFlag.mjVIS_PERTFORCE] = True      # Show perturbations

Named Access Pattern

# Instead of using indices
model.geom_rgba[geom_id, :]

# Use named access
model.geom('green_sphere').rgba
data.geom('box').xpos
data.joint('swing').qpos
data.sensor('accelerometer').data

Rendering Modes

RGB rendering: renderer.render() - returns pixels
Depth rendering: renderer.enable_depth_rendering() then renderer.render()
Segmentation: renderer.enable_segmentation_rendering() - returns object IDs and types

Important Notes

This is designed for Hugging Face Spaces with GPU instances (NVIDIA L40S or similar)
All training uses JAX/Brax for massive parallelization across thousands of environments
Policies are typically saved using Orbax checkpointing for fine-tuning
Domain randomization is critical for sim-to-real transfer
The environment supports multiple RL algorithms (PPO, SAC) through Brax
Asymmetric actor-critic (different observations for policy and value function) is commonly used