|
--- |
|
datasets: |
|
- yuvalkirstain/pickapic_v2 |
|
--- |
|
# Diffusion Model Alignment Using Direct Preference Optimization |
|
|
|
|
|
Direct Preference Optimization (DPO) for text-to-image diffusion models is a method to align diffusion models to text human preferences by directly optimizing on human comparison data. Please check paper at [Diffusion Model Alignment Using Direct Preference Optimization](https://arxiv.org/abs/2311.12908). |
|
|
|
|
|
SD1.5 model is fine-tuned from [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) on offline human preference data [pickapic_v2](https://huggingface.co/datasets/yuvalkirstain/pickapic_v2). |
|
|
|
SDXL model is fine-tuned from [stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) on offline human preference data [pickapic_v2](https://huggingface.co/datasets/yuvalkirstain/pickapic_v2). |
|
|