🧩 TokenCompose SD14 Model Card
🎬CVPR 2024
TokenCompose_SD14_A is a latent text-to-image diffusion model finetuned from the Stable-Diffusion-v1-4 checkpoint at resolution 512x512 on the VSR split of COCO image-caption pairs for 24,000 steps with a learning rate of 5e-6. The training objective involves token-level grounding terms in addition to denoising loss for enhanced multi-category instance composition and photorealism. The "_A/B" postfix indicates different finetuning runs of the model using the same above configurations.
📄 Paper
Please follow this link.
🧨Example Usage
We strongly recommend using the 🤗Diffuser library to run our model.
import torch
from diffusers import StableDiffusionPipeline
model_id = "mlpc-lab/TokenCompose_SD14_A"
device = "cuda"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)
pipe = pipe.to(device)
prompt = "A cat and a wine glass"
image = pipe(prompt).images[0]
image.save("cat_and_wine_glass.png")
⬆️Improvements over SD14
Method | Multi-category Instance Composition | Photorealism | Efficiency | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Object Accuracy | COCO | ADE20K | FID (COCO) | FID (Flickr30K) | Latency | |||||||
MG2 | MG3 | MG4 | MG5 | MG2 | MG3 | MG4 | MG5 | |||||
SD 1.4 | 29.86 | 90.721.33 | 50.740.89 | 11.680.45 | 0.880.21 | 89.810.40 | 53.961.14 | 16.521.13 | 1.890.34 | 20.88 | 71.46 | 7.540.17 |
TokenCompose (Ours) | 52.15 | 98.080.40 | 76.161.04 | 28.810.95 | 3.280.48 | 97.750.34 | 76.931.09 | 33.921.47 | 6.210.62 | 20.19 | 71.13 | 7.560.14 |
📰 Citation
@InProceedings{Wang2024TokenCompose,
author = {Wang, Zirui and Sha, Zhizhou and Ding, Zheng and Wang, Yilin and Tu, Zhuowen},
title = {TokenCompose: Text-to-Image Diffusion with Token-level Supervision},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {8553-8564}
}
- Downloads last month
- 587
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.