camenduru commited on Oct 21, 2023

Commit

77b0fc5

•

1 Parent(s): 6df6165

thanks to segmind ❤

Browse files

Files changed (19) hide show

README.md +143 -0
model_index.json +34 -0
scheduler/scheduler_config.json +18 -0
text_encoder/config.json +25 -0
text_encoder/model.fp16.safetensors +3 -0
text_encoder_2/config.json +25 -0
text_encoder_2/model.fp16.safetensors +3 -0
tokenizer/merges.txt +0 -0
tokenizer/special_tokens_map.json +24 -0
tokenizer/tokenizer_config.json +33 -0
tokenizer/vocab.json +0 -0
tokenizer_2/merges.txt +0 -0
tokenizer_2/special_tokens_map.json +24 -0
tokenizer_2/tokenizer_config.json +33 -0
tokenizer_2/vocab.json +0 -0
unet/config.json +93 -0
unet/diffusion_pytorch_model.fp16.safetensors +3 -0
vae/config.json +32 -0
vae/diffusion_pytorch_model.fp16.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,143 @@

+---
+license: apache-2.0
+tags:
+- text-to-image
+- ultra-realistic
+- text-to-image
+- stable-diffusion
+- distilled-model
+- knowledge-distillation
+pinned: true
+datasets:
+- zzliang/GRIT
+- wanng/midjourney-v5-202304-clean
+library_name: diffusers
+---
+# Segmind Stable Diffusion 1B (SSD-1B) Model Card
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/62039c2d91d53938a643317d/0Iu_0f0d1ihGy0YiOd9uS.png)
+## Model Description
+The Segmind Stable Diffusion Model (SSD-1B) is a **distilled 50% smaller** version of the Stable Diffusion XL (SDXL), offering a **60% speedup** while maintaining high-quality text-to-image generation capabilities. It has been trained on diverse datasets, including Grit and Midjourney scrape data, to enhance its ability to create a wide range of visual content based on textual prompts.
+This model employs a knowledge distillation strategy, where it leverages the teachings of several expert models in succession, including SDXL, ZavyChromaXL, and JuggernautXL, to combine their strengths and produce impressive visual outputs.
+Special thanks to the HF team 🤗 especially [Sayak](https://huggingface.co/sayakpaul), [Patrick](https://github.com/patrickvonplaten) and [Poli](https://huggingface.co/multimodalart) for their collaboration and guidance on this work.
+## Demo
+Try out the model at [Segmind SSD-1B](https://www.segmind.com/models/ssd-1b) for ⚡ fastest inference. You can also try it on [🤗 Spaces](https://huggingface.co/spaces/segmind/Segmind-Stable-Diffusion)
+## Image Comparision (SDXL-1.0 vs SSD-1B)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/62039c2d91d53938a643317d/mOM_OMxbivVBELad1QQYj.png)
+## Usage:
+This model can be used via the 🧨 Diffusers library.
+Make sure to install diffusers from source by running
+```
+pip install git+https://github.com/huggingface/diffusers
+```
+In addition, please install `transformers`, `safetensors` and `accelerate`:
+```
+pip install transformers accelerate safetensors
+```
+To use the model, you can run the following:
+```py
+from diffusers import StableDiffusionXLPipeline
+import torch
+pipe = StableDiffusionXLPipeline.from_pretrained("segmind/SSD-1B", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
+pipe.to("cuda")
+# if using torch < 2.0
+# pipe.enable_xformers_memory_efficient_attention()
+prompt = "An astronaut riding a green horse" # Your prompt here
+neg_prompt = "ugly, blurry, poor quality" # Negative prompt here
+image = pipe(prompt=prompt, negative_prompt=neg_prompt).images[0]
+```
+### Please do use negative prompting for the best quality!
+### Key Features
+- **Text-to-Image Generation:** The model excels at generating images from text prompts, enabling a wide range of creative applications.
+- **Distilled for Speed:** Designed for efficiency, this model offers a 60% speedup, making it a practical choice for real-time applications and scenarios where rapid image generation is essential.
+- **Diverse Training Data:** Trained on diverse datasets, the model can handle a variety of textual prompts and generate corresponding images effectively.
+- **Knowledge Distillation:** By distilling knowledge from multiple expert models, the Segmind Stable Diffusion Model combines their strengths and minimizes their limitations, resulting in improved performance.
+### Model Architecture
+The SSD-1B Model is a 1.3B Parameter Model which has several layers removed from the Base SDXL Model
+<img src="https://cdn-uploads.huggingface.co/production/uploads/62f8ca074588fe31f4361dae/SfEAW1rl2mxzhur02BCjk.png" width="400">
+### Training info
+These are the key hyperparameters used during training:
+* Steps: 251000
+* Learning rate: 1e-5
+* Batch size: 32
+* Gradient accumulation steps: 4
+* Image resolution: 1024
+* Mixed-precision: fp16
+### Speed Comparision
+We have observed that SSD-1B is upto 60% faster than the Base SDXL Model. Below is a comparision on an A100 40GB.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/62039c2d91d53938a643317d/f7BcTrz5PjYGC5htLUVge.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/62039c2d91d53938a643317d/moMZrlDr-HTFkZlqWHUjQ.png)
+### Model Sources
+For research and development purposes, the SSD-1B Model can be accessed via the Segmind AI platform. For more information and access details, please visit [Segmind](https://www.segmind.com/models/ssd-1b).
+## Uses
+### Direct Use
+The Segmind Stable Diffusion Model is suitable for research and practical applications in various domains, including:
+- **Art and Design:** It can be used to generate artworks, designs, and other creative content, providing inspiration and enhancing the creative process.
+- **Education:** The model can be applied in educational tools to create visual content for teaching and learning purposes.
+- **Research:** Researchers can use the model to explore generative models, evaluate its performance, and push the boundaries of text-to-image generation.
+- **Safe Content Generation:** It offers a safe and controlled way to generate content, reducing the risk of harmful or inappropriate outputs.
+- **Bias and Limitation Analysis:** Researchers and developers can use the model to probe its limitations and biases, contributing to a better understanding of generative models' behavior.
+### Out-of-Scope Use
+The SSD-1B Model is not suitable for creating factual or accurate representations of people, events, or real-world information. It is not intended for tasks requiring high precision and accuracy.
+## Limitations and Bias
+### Limitations
+- **Photorealism:** The model does not achieve perfect photorealism and may produce images with artistic or stylized qualities.
+- **Legible Text:** Generating legible text within images is a challenge for the model, and text within images may appear distorted or unreadable.
+- **Compositionality:** Complex tasks involving composition, such as rendering images based on intricate descriptions, may pose challenges for the model.
+- **Faces and People:** While the model can generate a wide range of content, it may not consistently produce realistic or high-quality images of faces and people.
+- **Lossy Autoencoding:** The autoencoding aspect of the model is lossy, which means that some details in the input text may not be perfectly retained in the generated images.
+### Bias
+The SSD-1B Model is trained on a diverse dataset, but like all generative models, it may exhibit biases present in the training data. Users are encouraged to be mindful of potential biases in the model's outputs and take appropriate steps to mitigate them.

model_index.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "_class_name": "StableDiffusionXLPipeline",
+  "_diffusers_version": "0.22.0.dev0",
+  "_name_or_path": "SDXL-Mini",
+  "force_zeros_for_empty_prompt": true,
+  "scheduler": [
+    "diffusers",
+    "EulerDiscreteScheduler"
+  ],
+  "text_encoder": [
+    "transformers",
+    "CLIPTextModel"
+  ],
+  "text_encoder_2": [
+    "transformers",
+    "CLIPTextModelWithProjection"
+  ],
+  "tokenizer": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "tokenizer_2": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "unet": [
+    "diffusers",
+    "UNet2DConditionModel"
+  ],
+  "vae": [
+    "diffusers",
+    "AutoencoderKL"
+  ]
+}

scheduler/scheduler_config.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+  "_class_name": "EulerDiscreteScheduler",
+  "_diffusers_version": "0.22.0.dev0",
+  "beta_end": 0.012,
+  "beta_schedule": "scaled_linear",
+  "beta_start": 0.00085,
+  "clip_sample": false,
+  "interpolation_type": "linear",
+  "num_train_timesteps": 1000,
+  "prediction_type": "epsilon",
+  "sample_max_value": 1.0,
+  "set_alpha_to_one": false,
+  "skip_prk_steps": true,
+  "steps_offset": 1,
+  "timestep_spacing": "leading",
+  "trained_betas": null,
+  "use_karras_sigmas": false
+}

text_encoder/config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "_name_or_path": "SDXL-Mini/text_encoder",
+  "architectures": [
+    "CLIPTextModel"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dropout": 0.0,
+  "eos_token_id": 2,
+  "hidden_act": "quick_gelu",
+  "hidden_size": 768,
+  "initializer_factor": 1.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 77,
+  "model_type": "clip_text_model",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "projection_dim": 768,
+  "torch_dtype": "float16",
+  "transformers_version": "4.34.0",
+  "vocab_size": 49408
+}

text_encoder/model.fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:660c6f5b1abae9dc498ac2d21e1347d2abdb0cf6c0c0c8576cd796491d9a6cdd
+size 246144152

text_encoder_2/config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "_name_or_path": "SDXL-Mini/text_encoder_2",
+  "architectures": [
+    "CLIPTextModelWithProjection"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dropout": 0.0,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_size": 1280,
+  "initializer_factor": 1.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 5120,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 77,
+  "model_type": "clip_text_model",
+  "num_attention_heads": 20,
+  "num_hidden_layers": 32,
+  "pad_token_id": 1,
+  "projection_dim": 1280,
+  "torch_dtype": "float16",
+  "transformers_version": "4.34.0",
+  "vocab_size": 49408
+}

text_encoder_2/model.fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec310df2af79c318e24d20511b601a591ca8cd4f1fce1d8dff822a356bcdb1f4
+size 1389382176

tokenizer/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|endoftext|>",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "add_prefix_space": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "clean_up_tokenization_spaces": true,
+  "do_lower_case": true,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "errors": "replace",
+  "model_max_length": 77,
+  "pad_token": "<|endoftext|>",
+  "tokenizer_class": "CLIPTokenizer",
+  "unk_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_2/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_2/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "!",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer_2/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "add_prefix_space": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "clean_up_tokenization_spaces": true,
+  "do_lower_case": true,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "errors": "replace",
+  "model_max_length": 77,
+  "pad_token": "!",
+  "tokenizer_class": "CLIPTokenizer",
+  "unk_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer_2/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

unet/config.json ADDED Viewed

	@@ -0,0 +1,93 @@

+{
+  "_class_name": "UNet2DConditionModel",
+  "_diffusers_version": "0.22.0.dev0",
+  "_name_or_path": "SDXL-Mini/unet",
+  "act_fn": "silu",
+  "addition_embed_type": "text_time",
+  "addition_embed_type_num_heads": 64,
+  "addition_time_embed_dim": 256,
+  "attention_head_dim": [
+    5,
+    10,
+    20
+  ],
+  "attention_type": "default",
+  "block_out_channels": [
+    320,
+    640,
+    1280
+  ],
+  "center_input_sample": false,
+  "class_embed_type": null,
+  "class_embeddings_concat": false,
+  "conv_in_kernel": 3,
+  "conv_out_kernel": 3,
+  "cross_attention_dim": 2048,
+  "cross_attention_norm": null,
+  "down_block_types": [
+    "DownBlock2D",
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D"
+  ],
+  "downsample_padding": 1,
+  "dropout": 0.0,
+  "dual_cross_attention": false,
+  "encoder_hid_dim": null,
+  "encoder_hid_dim_type": null,
+  "flip_sin_to_cos": true,
+  "freq_shift": 0,
+  "in_channels": 4,
+  "layers_per_block": 2,
+  "mid_block_only_cross_attention": null,
+  "mid_block_scale_factor": 1,
+  "mid_block_type": "UNetMidBlock2D",
+  "norm_eps": 1e-05,
+  "norm_num_groups": 32,
+  "num_attention_heads": null,
+  "num_class_embeds": null,
+  "only_cross_attention": false,
+  "out_channels": 4,
+  "projection_class_embeddings_input_dim": 2816,
+  "resnet_out_scale_factor": 1.0,
+  "resnet_skip_time_act": false,
+  "resnet_time_scale_shift": "default",
+  "reverse_transformer_layers_per_block": [
+    [
+      4,
+      4,
+      10
+    ],
+    [
+      2,
+      1,
+      1
+    ],
+    1
+  ],
+  "sample_size": 128,
+  "time_cond_proj_dim": null,
+  "time_embedding_act_fn": null,
+  "time_embedding_dim": null,
+  "time_embedding_type": "positional",
+  "timestep_post_act": null,
+  "transformer_layers_per_block": [
+    [
+      1
+    ],
+    [
+      2,
+      2
+    ],
+    [
+      4,
+      4
+    ]
+  ],
+  "up_block_types": [
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D",
+    "UpBlock2D"
+  ],
+  "upcast_attention": null,
+  "use_linear_projection": true
+}

unet/diffusion_pytorch_model.fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:40d8ea9159f3e875278dacc7879442d58c45850cf13c62f5e26681061c51829a
+size 2662790608

vae/config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.22.0.dev0",
+  "_name_or_path": "SDXL-Mini/vae",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "force_upcast": true,
+  "in_channels": 3,
+  "latent_channels": 4,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 1024,
+  "scaling_factor": 0.13025,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ]
+}

vae/diffusion_pytorch_model.fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7942ee243ddc9adb523003dadbba170e900e9e733a44342d05ccda2cc574b66b
+size 167335342