Edit model card

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference


We propose MaPO, a reference-free, sample-efficient, memory-friendly alignment technique for text-to-image diffusion models. For more details on the technique, please refer to our paper here.

Developed by

  • Jiwoo Hong* (KAIST AI)
  • Sayak Paul* (Hugging Face)
  • Noah Lee (KAIST AI)
  • Kashif Rasul (Hugging Face)
  • James Thorne (KAIST AI)
  • Jongheon Jeong (Korea University)

Dataset

This model was fine-tuned from Stable Diffusion XL on the cartoon split of Pick-Style.

Training Code

Refer to our code repository here.

Inference

from diffusers import DiffusionPipeline, AutoencoderKL, UNet2DConditionModel
import torch 

sdxl_id = "stabilityai/stable-diffusion-xl-base-1.0"
vae_id = "madebyollin/sdxl-vae-fp16-fix"
unet_id = "mapo-t2i/mapo-pick-style-cartoon"

vae = AutoencoderKL.from_pretrained(vae_id, torch_dtype=torch.float16)
unet = UNet2DConditionModel.from_pretrained(unet_id, subfolder='unet', torch_dtype=torch.float16)
pipeline = DiffusionPipeline.from_pretrained(sdxl_id, vae=vae, unet=unet, torch_dtype=torch.float16).to("cuda")

prompt = "portrait of gorgeous cyborg with golden hair, high resolution"
image = pipeline(prompt=prompt, num_inference_steps=30).images[0]

For qualitative results, please visit our project website.

Citation

@misc{todo,
    title={Margin-aware Preference Optimization for Aligning Diffusion Models without Reference}, 
    author={Jiwoo Hong and Sayak Paul and Noah Lee and Kashif Rasuland James Thorne and Jongheon Jeong},
    year={2024},
    eprint={todo},
    archivePrefix={arXiv},
    primaryClass={cs.CV,cs.LG}
}
Downloads last month
33
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Collection including mapo-t2i/mapo-pick-style-cartoon