ImageWAM-FLUX.2-4B-LIBERO

This repository contains the ImageWAM FLUX.2 4B checkpoint for LIBERO from ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

ImageWAM is a family of world action models built on image-editing foundation models. This checkpoint is intended for evaluation and research use with the accompanying ImageWAM codebase.

Model Details

Model family: ImageWAM
Image-editing backbone: FLUX.2 [klein] base
Variant: FLUX.2 klein-base-4B
Benchmark: LIBERO
Training code: yuyangalin/ImageWAM
Base model weights: Users must separately prepare the FLUX.2 klein-base-4B weights and FLUX.2 autoencoder as described in the ImageWAM README.

Files

Expected file layout:

.
├── model.pt
├── dataset_stats.json
└── config.yaml

model.pt: ImageWAM checkpoint used by the evaluation scripts.
dataset_stats.json: normalization statistics required for policy evaluation.
config.yaml: original training configuration for provenance and reproducibility.

Usage

Install and prepare the ImageWAM repository following the project README. Then download this model repository:

mkdir -p checkpoints/imagewam_release/libero/flux2_klein_4b

huggingface-cli download yuyangalin/ImageWAM-FLUX.2-4B-LIBERO \
  --repo-type model \
  --local-dir checkpoints/imagewam_release/libero/flux2_klein_4b

Prepare FLUX.2 4B weights and set:

export FLUX2_VARIANT=4b
export FLUX2_MODEL_PATH=/path/to/flux-2-klein-base-4b.safetensors
export FLUX2_AE_MODEL_PATH=/path/to/ae.safetensors
export FLUX2_QWEN3_MODEL_SPEC=Qwen/Qwen3-4B

Evaluate on LIBERO:

export CKPT_PATH="$(pwd)/checkpoints/imagewam_release/libero/flux2_klein_4b/model.pt"
export DATASET_STATS_PATH="$(pwd)/checkpoints/imagewam_release/libero/flux2_klein_4b/dataset_stats.json"

NUM_GPUS=8 FLUX2_VARIANT=4b bash scripts/flux2/run_eval_flux2_libero.sh

Intended Use

This checkpoint is intended for:

Reproducing ImageWAM LIBERO evaluations.
Research on robot policy learning, world action models, and image-editing-based action generation.
Comparison against other LIBERO policy models under the same evaluation setup.

This checkpoint is not intended for safety-critical or real-world robot deployment without additional validation.

Limitations

Evaluation requires the ImageWAM codebase and the LIBERO benchmark environment.
The checkpoint assumes the same model variant and configuration used during training. See train_config.yaml.
Users must separately prepare the matching FLUX.2 4B base model and autoencoder weights.
Performance may differ if the simulator version, dataset preprocessing, action normalization statistics, or evaluation settings differ from the release setup.

Citation

If you use this checkpoint, please cite the ImageWAM paper:

@misc{zhang2026imagewam,
      title={ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?}, 
      author={Yuyang Zhang and Wenyao Zhang and Zekun Qi and He Zhang and Haitao Lin and Jingbo Zhang and Yao Mu and Xiaokang Yang and Wenjun Zeng and Xin Jin},
      year={2026},
      eprint={2606.19531},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2606.19531}, 
}

Acknowledgements

ImageWAM builds on several open-source projects and model families, including FLUX.2, FastWAM, LIBERO, LIBERO-plus, and RoboTwin. Please also follow the licenses and citation requirements of the corresponding upstream projects.

Downloads last month: 8

Video Preview

Robotics

Dataset used to train yuyangalin/ImageWAM-FLUX.2-4B-LIBERO

Collection including yuyangalin/ImageWAM-FLUX.2-4B-LIBERO

ImageWAM

Collection

Models of ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing? • 3 items • Updated 3 days ago

Paper for yuyangalin/ImageWAM-FLUX.2-4B-LIBERO

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

Paper • 2606.19531 • Published 4 days ago • 12