playgroundai
/

playground-v2-1024px-aesthetic

@@ -25,9 +25,9 @@ This repository contains a model that generates highly aesthetic images of resol
 **Playground v2** is a diffusion-based text-to-image generative model. The model was trained from scratch by the research team at [Playground](https://playground.com).
-Playground v2’s images are favored **2.5** times more than those produced by Stable Diffusion XL, according to Playground’s [user study](#user-study).
-We are thrilled to release [intermediate checkpoints](#intermediate-base-models) at different training stages, including evaluation metrics, to the community. We hope this will foster more foundation model research in pixels.
 Lastly, we introduce a new benchmark, [MJHQ-30K](#mjhq-30k-benchmark), for automatic evaluation of a model’s aesthetic quality.
@@ -36,7 +36,7 @@ Lastly, we introduce a new benchmark, [MJHQ-30K](#mjhq-30k-benchmark), for autom
 - **Developed by:** [Playground](https://playground.com)
 - **Model type:** Diffusion-based text-to-image generative model
 - **License:** [Playground v2 Community License](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic/blob/main/LICENSE.md)
-- **Model Description:** This model generates images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pre-trained text encoders ([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip) and [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main)). It follows the same architecture as [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0).
 ### Using the model with 🧨 Diffusers
@@ -85,7 +85,7 @@ During the user study, we give users instructions to evaluate image pairs based
 We introduce a new benchmark, [MJHQ-30K](https://huggingface.co/datasets/playgroundai/MJHQ30K), for automatic evaluation of a model’s aesthetic quality. The benchmark computes FID on a high-quality dataset to gauge aesthetic quality.
-We curate the high-quality dataset from Midjourney with 10 common categories, each category with 3K samples. Following common practice, we use aesthetic score and CLIP score to ensure high image quality and high image-text alignment. Furthermore, we take extra care to make the data diverse within each category.
 For Playground v2, we report both the overall FID and per-category FID. All FID metrics are computed at resolution 1024x1024. Our benchmark results show that our model outperforms SDXL-1-0-refiner in overall FID and all category FIDs, especially in people and fashion categories. This is in line with the results of the user study, which indicates a correlation between human preference and FID score on the MJHQ30K benchmark.

 **Playground v2** is a diffusion-based text-to-image generative model. The model was trained from scratch by the research team at [Playground](https://playground.com).
+Images generated by Playground v2 are preferred **2.5** times more often than those produced by Stable Diffusion XL, according to Playground’s [user study](#user-study).
+We are thrilled to release [intermediate checkpoints](#intermediate-base-models) at different training stages, including evaluation metrics, to the community. We hope this will encourage further research into foundational models for image generation.
 Lastly, we introduce a new benchmark, [MJHQ-30K](#mjhq-30k-benchmark), for automatic evaluation of a model’s aesthetic quality.
 - **Developed by:** [Playground](https://playground.com)
 - **Model type:** Diffusion-based text-to-image generative model
 - **License:** [Playground v2 Community License](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic/blob/main/LICENSE.md)
+- **Summary:** This model generates images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pre-trained text encoders ([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip) and [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main)). It follows the same architecture as [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0).
 ### Using the model with 🧨 Diffusers
 We introduce a new benchmark, [MJHQ-30K](https://huggingface.co/datasets/playgroundai/MJHQ30K), for automatic evaluation of a model’s aesthetic quality. The benchmark computes FID on a high-quality dataset to gauge aesthetic quality.
+We have curated a high-quality dataset from Midjourney, featuring 10 common categories, with each category containing 3,000 samples. Following common practice, we use aesthetic score and CLIP score to ensure high image quality and high image-text alignment. Furthermore, we take extra care to make the data diverse within each category.
 For Playground v2, we report both the overall FID and per-category FID. All FID metrics are computed at resolution 1024x1024. Our benchmark results show that our model outperforms SDXL-1-0-refiner in overall FID and all category FIDs, especially in people and fashion categories. This is in line with the results of the user study, which indicates a correlation between human preference and FID score on the MJHQ30K benchmark.