nielsr HF Staff commited on
Commit
2f8b77f
·
verified ·
1 Parent(s): f96644b

Improve model card: Add paper abstract, code link, project page link, usage example and bibtex citation

Browse files

This PR improves the model card for DiffBlender by:

- Adding a direct link to the GitHub repository and project page for easier navigation.
- Adding the paper abstract to provide a comprehensive overview of the model.
- Adding a "Quick Start" section with installation instructions and an inference example from the GitHub repository.
- Adding a bibtex citation.
- Changing the library_name to diffusers.

Please review and merge this PR if these changes accurately reflect the model's information.

Files changed (1) hide show
  1. README.md +39 -9
README.md CHANGED
@@ -1,17 +1,24 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
5
- library_name: transformers
 
6
  pipeline_tag: text-to-image
7
  ---
8
 
9
- <br>
10
 
11
- # DiffBlender Model Card
12
 
13
- This repo contains the models from our paper [**DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models**](https://arxiv.org/abs/2305.15194).
 
14
 
 
 
 
 
 
 
15
 
16
  ## Model details
17
 
@@ -28,11 +35,34 @@ Apache 2.0 License
28
  **Where to send questions or comments about the model:**
29
  https://github.com/sungnyun/diffblender/issues
30
 
 
 
 
 
 
31
 
32
- ## Training dataset
33
- [Microsoft COCO 2017 dataset](https://cocodataset.org/#home)
 
 
 
 
 
 
 
34
 
 
35
 
36
- <br>
 
37
 
38
- More detials are in our project page, https://sungnyun.github.io/diffblender/.
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - en
4
+ library_name: diffusers
5
+ license: apache-2.0
6
  pipeline_tag: text-to-image
7
  ---
8
 
9
+ # DiffBlender: Composable and Versatile Multimodal Text-to-Image Diffusion Models
10
 
11
+ This repository contains the models from our paper [**DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models**](https://huggingface.co/papers/2305.15194).
12
 
13
+ [Code](https://github.com/sungnyun/diffblender)
14
+ [Project Page](https://sungnyun.github.io/diffblender/)
15
 
16
+ <p align="center">
17
+ <img width="1369" alt="teaser" src="https://github.com/sungnyun/diffblender/raw/main/assets/fig1.png">
18
+ </p>
19
+
20
+ ## Abstract
21
+ In this study, we aim to enhance the capabilities of diffusion-based text-to-image (T2I) generation models by integrating diverse modalities beyond textual descriptions within a unified framework. To this end, we categorize widely used conditional inputs into three modality types: structure, layout, and attribute. We propose a multimodal T2I diffusion model, which is capable of processing all three modalities within a single architecture without modifying the parameters of the pre-trained diffusion model, as only a small subset of components is updated. Our approach sets new benchmarks in multimodal generation through extensive quantitative and qualitative comparisons with existing conditional generation methods. We demonstrate that DiffBlender effectively integrates multiple sources of information and supports diverse applications in detailed image synthesis.
22
 
23
  ## Model details
24
 
 
35
  **Where to send questions or comments about the model:**
36
  https://github.com/sungnyun/diffblender/issues
37
 
38
+ ## Quick Start
39
+ Install the necessary packages with:
40
+ ```sh
41
+ $ pip install -r requirements.txt
42
+ ```
43
 
44
+ Download DiffBlender model checkpoint from this [Huggingface model](https://huggingface.co/sungnyun/diffblender), and place it under `./diffblender_checkpoints/`.
45
+ Also, prepare the SD model from this [link](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original) (we used CompVis/sd-v1-4.ckpt).
46
+
47
+ ### Try Multimodal T2I Generation with DiffBlender
48
+ ```sh
49
+ $ python inference.py --ckpt_path=./diffblender_checkpoints/{CKPT_NAME}.pth \
50
+ --official_ckpt_path=/path/to/sd-v1-4.ckpt \
51
+ --save_name={SAVE_NAME}
52
+ ```
53
 
54
+ Results will be saved under `./inference/{SAVE_NAME}/`, in the format as {conditions + generated image}.
55
 
56
+ ## Training dataset
57
+ [Microsoft COCO 2017 dataset](https://cocodataset.org/#home)
58
 
59
+ ## Citation
60
+ If you find our work useful or helpful for your R&D works, please feel free to cite our paper as below.
61
+ ```bibtex
62
+ @article{kim2023diffblender,
63
+ title={DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models},
64
+ author={Kim, Sungnyun and Lee, Junsoo and Hong, Kibeom and Kim, Daesik and Ahn, Namhyuk},
65
+ journal={arXiv preprint arXiv:2305.15194},
66
+ year={2023}
67
+ }
68
+ ```