
Text-to-Image
Generates images from input text. These models can be used to generate and modify images based on text prompts.
Input
A city above clouds, pastel colors, Victorian style

About Text-to-Image
Use Cases
Data Generation
Businesses can generate data for their their use cases by inputting text and getting image outputs.
Immersive Conversational Chatbots
Chatbots can be made more immersive if they provide contextual images based on the input provided by the user.
Creative Ideas for Fashion Industry
Different patterns can be generated to obtain unique pieces of fashion. Text-to-image models make creations easier for designers to conceptualize their design before actually implementing it.
Architecture Industry
Architects can utilise the models to construct an environment based out on the requirements of the floor plan. This can also include the furniture that has to be placed in that environment.
Task Variants
You can contribute variants of this task here.
Inference
You can add a small snippet here that shows how to infer with text-to-image
models.
Useful Resources
- Hugging Face Diffusion Models Course
- Getting Started with Diffusers
- Text-to-Image Generation
- MinImagen - Build Your Own Imagen Text-to-Image Model
- Using LoRA for Efficient Stable Diffusion Fine-Tuning
- Using Stable Diffusion with Core ML on Apple Silicon
- A guide on Vector Quantized Diffusion
- 🧨 Stable Diffusion in JAX/Flax
- Running IF with 🧨 diffusers on a Free Tier Google Colab
This page was made possible thanks to the efforts of Ishan Dutta, Enrique Elias Ubaldo and Oğuz Akif.

Note A latent text-to-image diffusion model capable of generating photo-realistic images given any text input.
Note A model that can be used to generate images based on text prompts. The DALL·E Mega model is the largest version of DALLE Mini.
Note A text-to-image model that can generate coherent text inside image.
Note A powerful text-to-image model.
Note RedCaps is a large-scale dataset of 12M image-text pairs collected from Reddit.
Note Conceptual Captions is a dataset consisting of ~3.3M images annotated with captions.
Note A powerful text-to-image application.
Note An text-to-image application that can generate coherent text inside the image.
Note An powerful text-to-image application that can generate images.
Note An powerful text-to-image application that can generates 3D representations.
No example metric is defined for this task.
Note Contribute by proposing a metric for this task !