aykamko commited on
Commit
f1cfd65
1 Parent(s): d0b80c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -87,11 +87,11 @@ During the user study, we give users instructions to evaluate image pairs based
87
  | SDXL-1-0-refiner | 9.55 |
88
  | [playground-v2-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic) | **7.07** |
89
 
90
- We introduce a new benchmark, [MJHQ-30K](https://huggingface.co/datasets/playgroundai/MJHQ30K), for automatic evaluation of a model’s aesthetic quality. The benchmark computes FID on a high-quality dataset to gauge aesthetic quality.
91
 
92
  We curate the high-quality dataset from Midjourney with 10 common categories, each category with 3K samples. Following common practice, we use aesthetic score and CLIP score to ensure high image quality and high image-text alignment. Furthermore, we take extra care to make the data diverse within each category.
93
 
94
- For Playground v2, we report both the overall FID and per-category FID. All FID metrics are computed at resolution 1024x1024. Our benchmark results show that our model outperforms SDXL-1-0-refiner in overall FID and all category FIDs, especially in people and fashion categories. This is in line with the results of the user study, which indicates a correlation between human preference and FID score on the MJHQ30K benchmark.
95
 
96
  We release this benchmark to the public and encourage the community to adopt it for benchmarking their models’ aesthetic quality.
97
 
 
87
  | SDXL-1-0-refiner | 9.55 |
88
  | [playground-v2-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic) | **7.07** |
89
 
90
+ We introduce a new benchmark, [MJHQ-30K](https://huggingface.co/datasets/playgroundai/MJHQ-30K), for automatic evaluation of a model’s aesthetic quality. The benchmark computes FID on a high-quality dataset to gauge aesthetic quality.
91
 
92
  We curate the high-quality dataset from Midjourney with 10 common categories, each category with 3K samples. Following common practice, we use aesthetic score and CLIP score to ensure high image quality and high image-text alignment. Furthermore, we take extra care to make the data diverse within each category.
93
 
94
+ For Playground v2, we report both the overall FID and per-category FID. All FID metrics are computed at resolution 1024x1024. Our benchmark results show that our model outperforms SDXL-1-0-refiner in overall FID and all category FIDs, especially in people and fashion categories. This is in line with the results of the user study, which indicates a correlation between human preference and FID score on the MJHQ-30K benchmark.
95
 
96
  We release this benchmark to the public and encourage the community to adopt it for benchmarking their models’ aesthetic quality.
97