Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

AlphaGRPO-RT2I

🤗 Models | 📄 Paper | 🌐 Project Page | 💻 GitHub | 🧩 Base Model

Model Summary

AlphaGRPO-RT2I is a PEFT LoRA adapter for BAGEL-7B-MoT, trained with AlphaGRPO for reasoning text-to-image generation.

This repository contains adapter weights only. Please load it together with the BAGEL base model. The adapter uses LoRA rank 32 and alpha 64.

Usage

Set the adapter path when running AlphaGRPO/BAGEL inference or evaluation:

export BAGEL_LORA_PATH=/path/to/AlphaGRPO-rt2i

For installation, evaluation scripts, and full usage examples, please see the GitHub repository.

Citation

@inproceedings{huang2026alphagrpo,
  title={AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in Unified Multimodal Models via Decompositional Verifiable Reward},
  author={Huang, Runhui and Wu, Jie and Yang, Rui and Liu, Zhe and Zhao, Hengshuang},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2026}
}

Downloads last month: 14

Model tree for huangrh9/AlphaGRPO-rt2i

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

ByteDance-Seed/BAGEL-7B-MoT

Adapter

(2)

this model

Paper for huangrh9/AlphaGRPO-rt2i

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

Paper • 2605.12495 • Published May 12 • 35