YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Genie-Envisioner Inference β ManiSkill Conflict Experiments
This repository contains the inference code and environment for evaluating Genie-Envisioner (GE-Act) on OOD conflict experiments in the ManiSkill simulation framework.
In a conflict experiment the robot receives an instruction that names two different object attributes (e.g., "Lift the red cube"), but the scene contains two objects that each satisfy only one attribute (one is red but not a cube; the other is a cube but not red). By recording which object the robot lifts across many such trials we can measure the model's Factor Dominance Rate (FDR) β a behavioural bias metric for language-conditioned robot manipulation.
Repository Structure
genie-inference-maniskill/
βββ genie_envisioner/ # GE-Act inference code
β βββ models/ # MVActorModel architecture
β βββ runner/ # Inference runner (rollout loop)
β βββ utils/ # Shared utilities
β βββ configs/
β β βββ ltx_model/conflict/ # Per-experiment configs + action stats
β βββ conflict_main.py # Main rollout script (single pair or batch)
β βββ run_ood_experiment_inference.sh # Batch OOD evaluation script
β βββ setup_maniskill_env.sh # Conda environment setup
β βββ requirements.txt # Python dependencies
β βββ eval_conflict.md # Detailed evaluation guide
β
βββ maniskill_conflict/ # ManiSkill conflict environment
βββ mani_skill/ # Modified ManiSkill package
β βββ envs/tasks/ # VerbObjectColor-v1 conflict task
β βββ assets/ # Robot and scene assets
βββ conflict_experiment/ # Experiment utilities (pair generation, etc.)
βββ setup.py
βββ pyproject.toml
Quick Start
1. Clone this repository
git clone https://huggingface.co/yqi19/genie-inference-maniskill
cd genie-inference-maniskill
2. Set up the conda environment
bash genie_envisioner/setup_maniskill_env.sh
conda activate genie_envisioner
3. Download LTX-Video (required backbone)
GE-Act uses LTX-Video as its video generation backbone:
git clone https://huggingface.co/Lightricks/LTX-Video /path/to/LTX-Video
4. Obtain a GE-Act checkpoint
Checkpoints are structured as <experiment>/step_<N>/ directories containing
config.json and diffusion_pytorch_model.safetensors. For example:
checkpoints/
βββ color_object/
βββ step_30000/
βββ config.json
βββ diffusion_pytorch_model.safetensors
5. Run an OOD conflict evaluation
WEIGHT=/path/to/checkpoints/color_object/step_30000 \
LTX_MODEL=/path/to/LTX-Video \
conda run -n genie_envisioner \
bash genie_envisioner/run_ood_experiment_inference.sh \
color_object \
42 \
200 \
results/color_object_ood.txt
Supported Experiments
| Experiment | Factor A | Factor B | Description |
|---|---|---|---|
color_object |
color | shape | Red object vs. cube β which does the model lift? |
color_size |
color | size | Coloured vs. sized object |
color_spatial |
color | spatial position | Coloured vs. positioned object |
size_object |
size | shape | Sized vs. shaped object |
spatial_object |
spatial position | shape | Positioned vs. shaped object |
spatial_size |
spatial position | size | Positioned vs. sized object |
verb_color |
verb | color | Verb-defined vs. coloured target |
verb_object |
verb | shape | Verb-defined vs. shaped target |
verb_size |
verb | size | Verb-defined vs. sized target |
verb_spatial |
verb | spatial position | Verb-defined vs. positioned target |
Factor Dominance Rate (FDR)
FDR measures how strongly a model is biased toward one factor over another:
FDR(f1, f2) = (S_f1 - S_f2) / (S_f1 + S_f2 + Ξ΅) β [-1, +1]
where S_f1 and S_f2 are success rates on factor-1-instruction runs and factor-2-instruction runs respectively, and Ξ΅ is a small constant for numerical stability.
A positive FDR indicates f1 dominance; negative indicates f2 dominance; 0 indicates no bias.
Environment Details
VerbObjectColor-v1
The conflict environment (mani_skill/envs/tasks/) is a modified version of ManiSkill's
tabletop manipulation task. Key properties:
- Two objects placed at fixed or randomised positions; each satisfies one factor
- Language instruction generated from the experiment's factor pair
- Success criterion: robot lifts the target object above a threshold height
- Dual success tracking: separate success signals for factor-A-target and factor-B-target objects per episode
Observation space
agent/qpos,agent/qvelβ proprioceptive joint statesensor_data/base_camera/rgbβ 256Γ256 RGB camera imagesensor_data/base_camera/depthβ depth image- Language instruction string
Action space
8-dimensional joint position control (7 DOF robot arm joints + gripper).
Running Multiple Experiments
WEIGHT_ROOT=/path/to/checkpoints
LTX_MODEL=/path/to/LTX-Video
for EXP in color_object color_size color_spatial size_object spatial_object \
spatial_size verb_color verb_object verb_size verb_spatial; do
WEIGHT="${WEIGHT_ROOT}/${EXP}/step_30000" \
LTX_MODEL="${LTX_MODEL}" \
conda run -n genie_envisioner \
bash genie_envisioner/run_ood_experiment_inference.sh \
"${EXP}" 42 200 "results/genie_${EXP}_seed42.txt"
done
Detailed Documentation
See genie_envisioner/eval_conflict.md for:
- Full environment variable reference
- Single-pair debugging commands
- Output file format description
- Manual checkpoint loading example
- Config file overview
- GPU memory requirements
Requirements
- Python 3.10
- CUDA 12.4 compatible GPU (β₯16 GB VRAM recommended; RTX 4090 tested)
- Conda
- ~15 GB disk space for dependencies + LTX-Video backbone
Key Python packages:
torch==2.6.0+cu124diffusers==0.32.0transformers==4.51.3safetensors==0.6.2mani_skill(frommaniskill_conflict/in this repo)
Citation
If you use this code or the conflict experiment framework, please cite:
@inproceedings{genie_envisioner,
title = {Genie-Envisioner: ...},
author = {...},
booktitle = {...},
year = {2025},
}