AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward
Paper โข 2605.12495 โข Published โข 35
How to use huangrh9/AlphaGRPO-rt2i with PEFT:
Task type is invalid.
Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
๐ค Models | ๐ Paper | ๐ Project Page | ๐ป GitHub | ๐งฉ Base Model
AlphaGRPO-RT2I is a PEFT LoRA adapter for BAGEL-7B-MoT, trained with AlphaGRPO for reasoning text-to-image generation.
This repository contains adapter weights only. Please load it together with the BAGEL base model. The adapter uses LoRA rank 32 and alpha 64.
Set the adapter path when running AlphaGRPO/BAGEL inference or evaluation:
export BAGEL_LORA_PATH=/path/to/AlphaGRPO-rt2i
For installation, evaluation scripts, and full usage examples, please see the GitHub repository.
@inproceedings{huang2026alphagrpo,
title={AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in Unified Multimodal Models via Decompositional Verifiable Reward},
author={Huang, Runhui and Wu, Jie and Yang, Rui and Liu, Zhe and Zhao, Hengshuang},
booktitle={International Conference on Machine Learning (ICML)},
year={2026}
}