Stable Diffusion v2-1-base Model Card
This model was generated by Hugging Face using Appleβs repository which has ASCL. This version contains 2-bit linearly quantized Core ML weights for iOS 17 or macOS 14. To use weights without quantization, please visit this model instead.
This model card focuses on the model associated with the Stable Diffusion v2-1-base model.
This stable-diffusion-2-1-base
model fine-tunes stable-diffusion-2-base (512-base-ema.ckpt
) with 220k extra steps taken, with punsafe=0.98
on the same dataset.
These weights here have been converted to Core ML for use on Apple Silicon hardware.
There are 4 variants of the Core ML weights:
coreml-stable-diffusion-2-1-base
βββ original
β βββ compiled # Swift inference, "original" attention
β βββ packages # Python inference, "original" attention
βββ split_einsum
βββ compiled # Swift inference, "split_einsum" attention
βββ packages # Python inference, "split_einsum" attention
There are also two zip archives suitable for use in the Hugging Face demo app and other third party tools:
coreml-stable-diffusion-2-1-base-palettized_original_compiled.zip
contains the compiled, 6-bit model withORIGINAL
attention implementation.coreml-stable-diffusion-2-1-base-palettized_split_einsum_v2_compiled.zip
contains the compiled, 6-bit model withSPLIT_EINSUM_V2
attention implementation.
Please, refer to https://huggingface.co/blog/diffusers-coreml for details.
- Use it with π§¨
diffusers
- Use it with the
stablediffusion
repository: download thev2-1_512-ema-pruned.ckpt
here.
Model Details
Developed by: Robin Rombach, Patrick Esser
Model type: Diffusion-based text-to-image generation model
Language(s): English
License: CreativeML Open RAIL++-M License
Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H).
Resources for more information: GitHub Repository.
Cite as:
@InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} }
*This model was quantized by Vishnou Vinayagame and adapted from the original by Pedro Cuenca, itself adapted from Robin Rombach, Patrick Esser and David Ha This model card was adapted by Pedro Cuenca from the original written by: Robin Rombach, Patrick Esser and David Ha and is based on the Stable Diffusion v1 and DALL-E Mini model card.