Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ library_name: diffusers
|
|
4 |
|
5 |
# SPRIGHT-T2I Model Card
|
6 |
|
7 |
-
The SPRIGHT-T2I model is a text-to-image diffusion model with high spatial coherency. It was first introduced in [Getting it Right: Improving Spatial Consistency in Text-to-Image Models](https://),
|
8 |
authored by Agneet Chatterjee<sup>\*</sup>, Gabriela Ben Melech Stan<sup>*</sup>, Estelle Aflalo, Sayak Paul, Dhruba Ghosh,
|
9 |
Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, and Yezhou Yang. _(<sup>\*</sup>denotes equal contributions)_
|
10 |
|
@@ -112,23 +112,32 @@ The following table compares our SPRIGHT-T2I model with SD 2.1 across multiple s
|
|
112 |
|
113 |
|Method |OA(%) β|VISOR-4(%) β|T2I-CompBench β|FID β|CMMD β|
|
114 |
|------------------|-------|------------|---------------|-----|------|
|
115 |
-
|SD v2.1 |47.83 |4.70 |0.1507 |21.646|
|
116 |
|SPRIGHT-T2I (ours)|60.68 |16.15 |0.2133 |16.149|0.512 |
|
117 |
|
118 |
Our key findings are:
|
119 |
- We increase the VISOR Object Accuracy (OA) score by 26.86%, indicating that we are much better at generating objects mentioned in the input prompt.
|
120 |
- VISOR-4 score of 16.15% denotes that for a given input prompt, we consistently generate a spatially accurate image.
|
121 |
-
- Improve on all aspects of the VISOR score while improving the ZS-FID and CMMD score on COCO-30K images by 23.74% and
|
122 |
- Enhance the ability to generate 1 and 2 objects, along with generating the correct number of objects, as indicated by evaluation on the [GenEval](https://github.com/djghosh13/geneval) benchmark.
|
123 |
|
124 |
### Model Resources
|
125 |
|
126 |
- **Dataset**: [SPRIGHT Dataset](https://huggingface.co/datasets/SPRIGHT-T2I/spright)
|
127 |
- **Repository:** [SPRIGHT-T2I GitHub Repository](https://github.com/orgs/SPRIGHT-T2I)
|
128 |
-
- **Paper:** [Getting it Right: Improving Spatial Consistency in Text-to-Image Models](https://)
|
129 |
- **Demo:** [SPRIGHT-T2I on Spaces](https://huggingface.co/spaces/SPRIGHT-T2I/SPRIGHT-T2I)
|
130 |
- **Project Website**: [SPRIGHT Website](https://spright-t2i.github.io/)
|
131 |
|
132 |
## Citation
|
133 |
|
134 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
|
5 |
# SPRIGHT-T2I Model Card
|
6 |
|
7 |
+
The SPRIGHT-T2I model is a text-to-image diffusion model with high spatial coherency. It was first introduced in [Getting it Right: Improving Spatial Consistency in Text-to-Image Models](https://arxiv.org/abs/2404.01197),
|
8 |
authored by Agneet Chatterjee<sup>\*</sup>, Gabriela Ben Melech Stan<sup>*</sup>, Estelle Aflalo, Sayak Paul, Dhruba Ghosh,
|
9 |
Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, and Yezhou Yang. _(<sup>\*</sup>denotes equal contributions)_
|
10 |
|
|
|
112 |
|
113 |
|Method |OA(%) β|VISOR-4(%) β|T2I-CompBench β|FID β|CMMD β|
|
114 |
|------------------|-------|------------|---------------|-----|------|
|
115 |
+
|SD v2.1 |47.83 |4.70 |0.1507 |21.646|0.703 |
|
116 |
|SPRIGHT-T2I (ours)|60.68 |16.15 |0.2133 |16.149|0.512 |
|
117 |
|
118 |
Our key findings are:
|
119 |
- We increase the VISOR Object Accuracy (OA) score by 26.86%, indicating that we are much better at generating objects mentioned in the input prompt.
|
120 |
- VISOR-4 score of 16.15% denotes that for a given input prompt, we consistently generate a spatially accurate image.
|
121 |
+
- Improve on all aspects of the VISOR score while improving the ZS-FID and CMMD score on COCO-30K images by 23.74% and 27.16%, respectively.
|
122 |
- Enhance the ability to generate 1 and 2 objects, along with generating the correct number of objects, as indicated by evaluation on the [GenEval](https://github.com/djghosh13/geneval) benchmark.
|
123 |
|
124 |
### Model Resources
|
125 |
|
126 |
- **Dataset**: [SPRIGHT Dataset](https://huggingface.co/datasets/SPRIGHT-T2I/spright)
|
127 |
- **Repository:** [SPRIGHT-T2I GitHub Repository](https://github.com/orgs/SPRIGHT-T2I)
|
128 |
+
- **Paper:** [Getting it Right: Improving Spatial Consistency in Text-to-Image Models](https://arxiv.org/abs/2404.01197)
|
129 |
- **Demo:** [SPRIGHT-T2I on Spaces](https://huggingface.co/spaces/SPRIGHT-T2I/SPRIGHT-T2I)
|
130 |
- **Project Website**: [SPRIGHT Website](https://spright-t2i.github.io/)
|
131 |
|
132 |
## Citation
|
133 |
|
134 |
+
```bibtex
|
135 |
+
@misc{chatterjee2024getting,
|
136 |
+
title={Getting it Right: Improving Spatial Consistency in Text-to-Image Models},
|
137 |
+
author={Agneet Chatterjee and Gabriela Ben Melech Stan and Estelle Aflalo and Sayak Paul and Dhruba Ghosh and Tejas Gokhale and Ludwig Schmidt and Hannaneh Hajishirzi and Vasudev Lal and Chitta Baral and Yezhou Yang},
|
138 |
+
year={2024},
|
139 |
+
eprint={2404.01197},
|
140 |
+
archivePrefix={arXiv},
|
141 |
+
primaryClass={cs.CV}
|
142 |
+
}
|
143 |
+
```
|