Instructions to use yqi19/genie_envisioner with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use yqi19/genie_envisioner with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("yqi19/genie_envisioner", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Genie Envisioner
A unified world foundation platform for robotic manipulation.
Repository Structure
checkpoints/
color_object/
step_30000/
config.json
diffusion_pytorch_model.safetensors # action model checkpoint (step 30000)
configs/ # YAML configs and JSON stats for all tasks
data/ # dataset classes (LeRobot-format, LIBERO, AgiBotWorld)
experiments/ # eval scripts for Calvin and LIBERO
models/ # LTX, Cosmos, pipeline, action patch modules
runner/ # ge_trainer.py, ge_inferencer.py
scripts/ # train.sh, infer.sh, get_statistics.py
utils/ # misc utilities
web_infer_utils/ # web inference server and client
main.py # training entry point
requirements.txt
Loading the color_object Checkpoint and Running Inference
1. Prerequisites
Clone this repo and install dependencies:
git clone https://huggingface.co/yqi19/genie_envisioner
cd genie_envisioner
pip install -r requirements.txt
You also need the LTX-Video base model (tokenizer, text encoder, VAE). Set its path as
pretrained_model_name_or_path in the config (see step 3).
2. Download the checkpoint
The action model checkpoint is already in this repo at:
checkpoints/color_object/step_30000/diffusion_pytorch_model.safetensors
checkpoints/color_object/step_30000/config.json
You can also download it programmatically:
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="yqi19/genie_envisioner",
local_dir="./genie_envisioner",
)
3. Update the config
Edit configs/ltx_model/conflict/action_model_color_object.yaml and set:
# Path to LTX-Video base model (tokenizer, text encoder, VAE)
pretrained_model_name_or_path: /path/to/LTX-Video
# Point to the downloaded checkpoint
diffusion_model:
model_path: checkpoints/color_object/step_30000
Also update the data.train.data_roots and data.val.data_roots fields to point to
your local color_object dataset (LeRobot format).
4. Run inference
import torch
from runner.ge_inferencer import Inferencer
inferencer = Inferencer(
config_file="configs/ltx_model/conflict/action_model_color_object.yaml",
output_dir="./inference_output",
weight_dtype=torch.bfloat16,
device="cuda:0",
)
inferencer.prepare_models()
inferencer.prepare_val_dataset()
inferencer.infer(
n_chunk_action=10, # number of sequential action chunks to predict
n_validation=1, # number of validation episodes
)
Results are saved to ./inference_output/<timestamp>/Inference/:
Validation_0_gt.mp4โ ground truth videoValidation_0.mp4โ generated video (ifreturn_video: true)openloop_evaluation_val0.pngโ open-loop action prediction plot
5. Key config fields
| Field | Description |
|---|---|
pretrained_model_name_or_path |
Path to LTX-Video base model |
diffusion_model.model_path |
Path to the action model checkpoint directory |
return_action |
true to predict actions |
return_video |
true to generate future video frames |
num_inference_step |
Diffusion denoising steps (default: 5) |
data.train.action_chunk |
Number of actions predicted per inference step (default: 9) |
data.train.n_previous |
Number of conditioning frames (default: 4) |
data.train.stat_file |
Path to action normalization stats JSON |
Evaluation on Calvin and LIBERO
Calvin
# Edit checkpoint and config paths in experiments/eval_calvin.sh first
bash experiments/eval_calvin.sh
LIBERO
# Edit checkpoint and config paths in experiments/eval_libero.sh first
bash experiments/eval_libero.sh
- Downloads last month
- -