Text-to-Image
Diffusers
Safetensors
StableDiffusionXLPipeline
playground
Inference Endpoints
Tayer ehsanakh commited on
Commit
d81270f
0 Parent(s):

Duplicate from playgroundai/playground-v2.5-1024px-aesthetic

Browse files

Co-authored-by: Ehsan Akhgari <ehsanakh@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
LICENSE.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Playground v2.5 Community License
2
+
3
+ **Release Date:** February 27, 2024
4
+
5
+ “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Playground Materials set forth herein.
6
+
7
+ “Documentation” means the specifications, manuals and documentation accompanying Playground v2.5 distributed by Playground at [https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic) or other authorized channel.
8
+
9
+ “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
10
+
11
+ “Playground v2.5” means the diffusion-based text-to-image generative models and software and algorithms, including checkpoints, trained model weights, and other elements of the foregoing distributed by Playground at [https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic) or other authorized channel.
12
+
13
+ “Playground Materials” means, collectively, Playground v2.5 and related Documentation (and any portion thereof) made available under this Agreement.
14
+
15
+ “Playground” or “we” means Mighty Computing, Inc. dba Playground AI<sup>TM</sup>.
16
+
17
+ By using or distributing any portion or element of the Playground Materials, you agree to be bound by this Agreement.
18
+
19
+ ## 1. License Rights and Redistribution.
20
+
21
+ ### a. Grant of Rights.
22
+ You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Playground’s intellectual property or other rights owned by Playground embodied in the Playground Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Playground Materials. Subject to the restrictions herein, this permissive license is available for free for research and commercial use (by an entity or individual).
23
+
24
+ ### b. Redistribution and Use.
25
+ i. If you distribute or make the Playground Materials, or any derivative works thereof, available to any third party, you shall provide a copy of this Agreement to such third party.
26
+
27
+ ii. If you receive Playground Materials, or any derivative works thereof, from an authorized Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
28
+
29
+ iii. You must retain in all copies of the Playground Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Playground v2.5 is licensed under the Playground v2.5 Community License.”
30
+
31
+ iv. Your use of the Playground Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Use Restrictions set forth in Attachment A. You shall require all of your users who use Playground v2.5 or any derivative works thereof, to comply with the terms of this section and the restrictions in Attachment A.
32
+
33
+ v. You will not use the Playground Materials or any output or results of the Playground Materials to improve any other text-to-image generative model (excluding Playground v2.5 or derivative works thereof). Fine-tuning Playground v2.5 is expressly permitted as a derivative work.
34
+
35
+ ## 2. Additional Commercial Terms.
36
+ If, at any time, (a) image generation or image editing is a core business or product of Licensee’s and (b) the total monthly unique users (MUU) of the products or services made available by or for Licensee, or Licensee’s affiliates, for such products or services is greater than 1 million MUUs in the preceding calendar month, then immediately thereafter you must request a license from Playground. Playground may grant this license to you in its sole discretion and you are not authorized to exercise any of the rights under this Agreement unless or until Playground otherwise expressly grants you such rights as a Licensee.
37
+
38
+ ## 3. Disclaimer of Warranty.
39
+ UNLESS REQUIRED BY APPLICABLE LAW, THE PLAYGROUND MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE PLAYGROUND MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE PLAYGROUND MATERIALS AND ANY OUTPUT AND RESULTS.
40
+
41
+ ## 4. Limitation of Liability.
42
+ IN NO EVENT WILL PLAYGROUND OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF PLAYGROUND OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
43
+
44
+ ## 5. Intellectual Property.
45
+
46
+ a. No trademark licenses are granted under this Agreement, and in connection with the Playground Materials, neither Playground nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Playground Materials.
47
+
48
+ b. Subject to Playground’s ownership of Playground Materials and derivatives made by or for Playground, with respect to any derivative works and modifications of the Playground Materials that are made by you, as between you and Playground, you are and will be the owner of such derivative works and modifications.
49
+
50
+ c. If you institute litigation or other proceedings against Playground or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Playground Materials or Playground v2.5 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Playground from and against any claim by any third party arising out of or related to your use or distribution of the Playground Materials.
51
+
52
+ ## 6. Term and Termination.
53
+ The term of this Agreement will commence upon your acceptance of this Agreement or access to the Playground Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Playground may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Playground Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
54
+
55
+ ## 7. Governing Law and Jurisdiction.
56
+ This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
57
+
58
+
59
+ **Attachment A - Use Restrictions**
60
+
61
+ You agree not to use Playground v2.5 or any derivative works thereof:
62
+
63
+ <ol type="a">
64
+ <li>In any way that violates any applicable national, federal, state, local or international law or regulation;</li>
65
+ <li>For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;</li>
66
+ <li>To generate or disseminate verifiably false information and/or content with the purpose of harming others;</li>
67
+ <li>To generate or disseminate personal identifiable information that can be used to harm an individual;</li>
68
+ <li>To defame, disparage or otherwise harass others;</li>
69
+ <li>For fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;</li>
70
+ <li>For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;</li>
71
+ <li>To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;</li>
72
+ <li>For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories;</li>
73
+ <li>To provide medical advice and medical results interpretation;</li>
74
+ <li>To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment (e.g. by text profiling, drawing causal relationships between assertions made in documents, indiscriminate and arbitrarily-targeted use).</li>
75
+ </ol>
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: playground-v2dot5-community
4
+ license_link: https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic/blob/main/LICENSE.md
5
+ tags:
6
+ - text-to-image
7
+ - playground
8
+ inference:
9
+ parameters:
10
+ guidance_scale: 3.0
11
+ ---
12
+ # Playground v2.5 – 1024px Aesthetic Model
13
+
14
+ This repository contains a model that generates highly aesthetic images of resolution 1024x1024, as well as portrait and landscape aspect ratios. You can use the model with Hugging Face 🧨 Diffusers.
15
+
16
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/636c0c4eaae2da3c76b8a9a3/HYUUGfU6SOCHsvyeISQ5Y.png)
17
+
18
+ **Playground v2.5** is a diffusion-based text-to-image generative model, and a successor to [Playground v2](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic).
19
+
20
+ Playground v2.5 is the state-of-the-art open-source model in aesthetic quality. Our user studies demonstrate that our model outperforms SDXL, Playground v2, PixArt-α, DALL-E 3, and Midjourney 5.2.
21
+
22
+ For details on the development and training of our model, please refer to our [blog post](https://blog.playgroundai.com/playground-v2-5/) and [technical report](https://marketing-cdn.playground.com/research/pgv2.5_compressed.pdf).
23
+
24
+ ### Model Description
25
+ - **Developed by:** [Playground](https://playground.com)
26
+ - **Model type:** Diffusion-based text-to-image generative model
27
+ - **License:** [Playground v2.5 Community License](https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic/blob/main/LICENSE.md)
28
+ - **Summary:** This model generates images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pre-trained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L). It follows the same architecture as [Stable Diffusion XL](https://huggingface.co/docs/diffusers/en/using-diffusers/sdxl).
29
+
30
+ ### Using the model with 🧨 Diffusers
31
+
32
+ Install diffusers >= 0.27.0 and the relevant dependencies.
33
+
34
+ ```
35
+ pip install diffusers>=0.27.0
36
+ pip install transformers accelerate safetensors
37
+ ```
38
+
39
+ **Notes:**
40
+ - The pipeline uses the `EDMDPMSolverMultistepScheduler` scheduler by default, for crisper fine details. It's an [EDM formulation](https://arxiv.org/abs/2206.00364) of the DPM++ 2M Karras scheduler. `guidance_scale=3.0` is a good default for this scheduler.
41
+ - The pipeline also supports the `EDMEulerScheduler` scheduler. It's an [EDM formulation](https://arxiv.org/abs/2206.00364) of the Euler scheduler. `guidance_scale=5.0` is a good default for this scheduler.
42
+
43
+ Then, run the following snippet:
44
+
45
+ ```python
46
+ from diffusers import DiffusionPipeline
47
+ import torch
48
+
49
+ pipe = DiffusionPipeline.from_pretrained(
50
+ "playgroundai/playground-v2.5-1024px-aesthetic",
51
+ torch_dtype=torch.float16,
52
+ variant="fp16",
53
+ ).to("cuda")
54
+
55
+ # # Optional: Use DPM++ 2M Karras scheduler for crisper fine details
56
+ # from diffusers import EDMDPMSolverMultistepScheduler
57
+ # pipe.scheduler = EDMDPMSolverMultistepScheduler()
58
+
59
+ prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
60
+ image = pipe(prompt=prompt, num_inference_steps=50, guidance_scale=3).images[0]
61
+ ```
62
+
63
+ ### Using the model with Automatic1111/ComfyUI
64
+
65
+ Support coming soon. We will update this model card with instructions when ready.
66
+
67
+ ### User Studies
68
+
69
+ This model card only provides a brief summary of our user study results. For extensive details on how we perform user studies, please check out our [technical report](https://marketing-cdn.playground.com/research/pgv2.5_compressed.pdf).
70
+
71
+ We conducted studies to measure overall aesthetic quality, as well as for the specific areas we aimed to improve with Playground v2.5, namely multi aspect ratios and human preference alignment.
72
+
73
+ #### Comparison to State-of-the-Art
74
+
75
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63855d851769b7c4b10e1f76/V7LFNzgoQJnL__ndU0CnE.png)
76
+
77
+ The aesthetic quality of Playground v2.5 dramatically outperforms the current state-of-the-art open source models SDXL and PIXART-α, as well as Playground v2. Because the performance differential between Playground V2.5 and SDXL was so large, we also tested our aesthetic quality against world-class closed-source models like DALL-E 3 and Midjourney 5.2, and found that Playground v2.5 outperforms them as well.
78
+
79
+ #### Multi Aspect Ratios
80
+
81
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/636c0c4eaae2da3c76b8a9a3/xMB0r-CmR3N6dABFlcV71.png)
82
+
83
+ Similarly, for multi aspect ratios, we outperform SDXL by a large margin.
84
+
85
+ #### Human Preference Alignment on People-related images
86
+
87
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/636c0c4eaae2da3c76b8a9a3/7c-8Stw52OsNtUjse8Slv.png)
88
+
89
+ Next, we benchmark Playground v2.5 specifically on people-related images, to test Human Preference Alignment. We compared Playground v2.5 against two commonly-used baseline models: SDXL and RealStock v2, a community fine-tune of SDXL that was trained on a realistic people dataset.
90
+
91
+ Playground v2.5 outperforms both baselines by a large margin.
92
+
93
+ ### MJHQ-30K Benchmark
94
+
95
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/636c0c4eaae2da3c76b8a9a3/7tyYDPGUtokh-k18XDSte.png)
96
+
97
+ | Model | Overall FID |
98
+ | ------------------------------------- | ----- |
99
+ | SDXL-1-0-refiner | 9.55 |
100
+ | [playground-v2-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic) | 7.07 |
101
+ | [playground-v2.5-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic) | **4.48** |
102
+
103
+ Lastly, we report metrics using our MJHQ-30K benchmark which we [open-sourced](https://huggingface.co/datasets/playgroundai/MJHQ-30K) with the v2 release. We report both the overall FID and per category FID. All FID metrics are computed at resolution 1024x1024. Our results show that Playground v2.5 outperforms both Playground v2 and SDXL in overall FID and all category FIDs, especially in the people and fashion categories. This is in line with the results of the user study, which indicates a correlation between human preferences and the FID score of the MJHQ-30K benchmark.
104
+
105
+ ### How to cite us
106
+
107
+ ```
108
+ @misc{li2024playground,
109
+ title={Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation},
110
+ author={Daiqing Li and Aleks Kamko and Ehsan Akhgari and Ali Sabet and Linmiao Xu and Suhail Doshi},
111
+ year={2024},
112
+ eprint={2402.17245},
113
+ archivePrefix={arXiv},
114
+ primaryClass={cs.CV}
115
+ }
116
+ ```
model_index.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "StableDiffusionXLPipeline",
3
+ "_diffusers_version": "0.27.0.dev0",
4
+ "feature_extractor": [
5
+ null,
6
+ null
7
+ ],
8
+ "force_zeros_for_empty_prompt": true,
9
+ "image_encoder": [
10
+ null,
11
+ null
12
+ ],
13
+ "scheduler": [
14
+ "diffusers",
15
+ "EDMDPMSolverMultistepScheduler"
16
+ ],
17
+ "text_encoder": [
18
+ "transformers",
19
+ "CLIPTextModel"
20
+ ],
21
+ "text_encoder_2": [
22
+ "transformers",
23
+ "CLIPTextModelWithProjection"
24
+ ],
25
+ "tokenizer": [
26
+ "transformers",
27
+ "CLIPTokenizer"
28
+ ],
29
+ "tokenizer_2": [
30
+ "transformers",
31
+ "CLIPTokenizer"
32
+ ],
33
+ "unet": [
34
+ "diffusers",
35
+ "UNet2DConditionModel"
36
+ ],
37
+ "vae": [
38
+ "diffusers",
39
+ "AutoencoderKL"
40
+ ]
41
+ }
playground-v2.5-1024px-aesthetic.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bcaa7dd6780974f000b17b5a6c63e6f867a75c51ffa85c67d6b196882c69b992
3
+ size 6938040576
playground-v2.5-1024px-aesthetic.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:956dca99114aaa5c3eb526381309d37ee96737e78ed64c8ae613409f47c3f65a
3
+ size 13875718056
scheduler/scheduler_config.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "EDMDPMSolverMultistepScheduler",
3
+ "_diffusers_version": "0.27.0.dev0",
4
+ "algorithm_type": "dpmsolver++",
5
+ "dynamic_thresholding_ratio": 0.995,
6
+ "euler_at_final": false,
7
+ "final_sigmas_type": "zero",
8
+ "lower_order_final": true,
9
+ "num_train_timesteps": 1000,
10
+ "prediction_type": "epsilon",
11
+ "rho": 7.0,
12
+ "sample_max_value": 1.0,
13
+ "sigma_data": 0.5,
14
+ "sigma_max": 80.0,
15
+ "sigma_min": 0.002,
16
+ "solver_order": 2,
17
+ "solver_type": "midpoint",
18
+ "thresholding": false
19
+ }
text_encoder/config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "CLIPTextModel"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 0,
7
+ "dropout": 0.0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "quick_gelu",
10
+ "hidden_size": 768,
11
+ "initializer_factor": 1.0,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 77,
16
+ "model_type": "clip_text_model",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "projection_dim": 768,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.35.2",
23
+ "vocab_size": 49408
24
+ }
text_encoder/model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:660c6f5b1abae9dc498ac2d21e1347d2abdb0cf6c0c0c8576cd796491d9a6cdd
3
+ size 246144152
text_encoder/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:778d02eb9e707c3fbaae0b67b79ea0d1399b52e624fb634f2f19375ae7c047c3
3
+ size 492265168
text_encoder/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b2a888f98b610f4666b5323f4012475cc752183ce3bbec3ccf25cf32cec03d7
3
+ size 492306586
text_encoder/pytorch_model.fp16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1aaa9196f44f8283e6549b748927d0d24b91710c1a216be590e08458bb5d615c
3
+ size 246185562
text_encoder_2/config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "CLIPTextModelWithProjection"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 0,
7
+ "dropout": 0.0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_size": 1280,
11
+ "initializer_factor": 1.0,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 5120,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 77,
16
+ "model_type": "clip_text_model",
17
+ "num_attention_heads": 20,
18
+ "num_hidden_layers": 32,
19
+ "pad_token_id": 1,
20
+ "projection_dim": 1280,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.35.2",
23
+ "vocab_size": 49408
24
+ }
text_encoder_2/model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec310df2af79c318e24d20511b601a591ca8cd4f1fce1d8dff822a356bcdb1f4
3
+ size 1389382176
text_encoder_2/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa5b2e6f4c2efc2d82e4b8312faec1a5540eabfc6415126c9a05c8436a530ef4
3
+ size 2778702264
text_encoder_2/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01854181cb926f2b305bae76ba3bbacf9f8f6eff785aeafb8b22a3e8fbe4b9b0
3
+ size 2778810142
text_encoder_2/pytorch_model.fp16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c61bac6a0e10e9c430b1faeb8338347758f7b5ca98dfbd7abff85c3e2f4305ea
3
+ size 1389490462
tokenizer/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer/special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<|endoftext|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<|endoftext|>",
25
+ "lstrip": false,
26
+ "normalized": true,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "49406": {
5
+ "content": "<|startoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "49407": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ }
20
+ },
21
+ "bos_token": "<|startoftext|>",
22
+ "clean_up_tokenization_spaces": true,
23
+ "do_lower_case": true,
24
+ "eos_token": "<|endoftext|>",
25
+ "errors": "replace",
26
+ "model_max_length": 77,
27
+ "pad_token": "<|endoftext|>",
28
+ "tokenizer_class": "CLIPTokenizer",
29
+ "unk_token": "<|endoftext|>"
30
+ }
tokenizer/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_2/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_2/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "!",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer_2/tokenizer_config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "!",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "49406": {
13
+ "content": "<|startoftext|>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "49407": {
21
+ "content": "<|endoftext|>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ }
28
+ },
29
+ "bos_token": "<|startoftext|>",
30
+ "clean_up_tokenization_spaces": true,
31
+ "do_lower_case": true,
32
+ "eos_token": "<|endoftext|>",
33
+ "errors": "replace",
34
+ "model_max_length": 77,
35
+ "pad_token": "!",
36
+ "tokenizer_class": "CLIPTokenizer",
37
+ "unk_token": "<|endoftext|>"
38
+ }
tokenizer_2/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
unet/config.json ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "UNet2DConditionModel",
3
+ "_diffusers_version": "0.27.0.dev0",
4
+ "act_fn": "silu",
5
+ "addition_embed_type": "text_time",
6
+ "addition_embed_type_num_heads": 64,
7
+ "addition_time_embed_dim": 256,
8
+ "attention_head_dim": [
9
+ 5,
10
+ 10,
11
+ 20
12
+ ],
13
+ "attention_type": "default",
14
+ "block_out_channels": [
15
+ 320,
16
+ 640,
17
+ 1280
18
+ ],
19
+ "center_input_sample": false,
20
+ "class_embed_type": null,
21
+ "class_embeddings_concat": false,
22
+ "conv_in_kernel": 3,
23
+ "conv_out_kernel": 3,
24
+ "cross_attention_dim": 2048,
25
+ "cross_attention_norm": null,
26
+ "down_block_types": [
27
+ "DownBlock2D",
28
+ "CrossAttnDownBlock2D",
29
+ "CrossAttnDownBlock2D"
30
+ ],
31
+ "downsample_padding": 1,
32
+ "dropout": 0.0,
33
+ "dual_cross_attention": false,
34
+ "encoder_hid_dim": null,
35
+ "encoder_hid_dim_type": null,
36
+ "flip_sin_to_cos": true,
37
+ "freq_shift": 0,
38
+ "in_channels": 4,
39
+ "layers_per_block": 2,
40
+ "mid_block_only_cross_attention": null,
41
+ "mid_block_scale_factor": 1,
42
+ "mid_block_type": "UNetMidBlock2DCrossAttn",
43
+ "norm_eps": 1e-05,
44
+ "norm_num_groups": 32,
45
+ "num_attention_heads": null,
46
+ "num_class_embeds": null,
47
+ "only_cross_attention": false,
48
+ "out_channels": 4,
49
+ "projection_class_embeddings_input_dim": 2816,
50
+ "resnet_out_scale_factor": 1.0,
51
+ "resnet_skip_time_act": false,
52
+ "resnet_time_scale_shift": "default",
53
+ "reverse_transformer_layers_per_block": null,
54
+ "sample_size": 128,
55
+ "time_cond_proj_dim": null,
56
+ "time_embedding_act_fn": null,
57
+ "time_embedding_dim": null,
58
+ "time_embedding_type": "positional",
59
+ "timestep_post_act": null,
60
+ "transformer_layers_per_block": [
61
+ 1,
62
+ 2,
63
+ 10
64
+ ],
65
+ "up_block_types": [
66
+ "CrossAttnUpBlock2D",
67
+ "CrossAttnUpBlock2D",
68
+ "UpBlock2D"
69
+ ],
70
+ "upcast_attention": false,
71
+ "use_linear_projection": true
72
+ }
unet/diffusion_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5544408cfc7601b9a121ed1e8e7f21142fe9d6badd55538b9ab828ad871e057c
3
+ size 10270604314
unet/diffusion_pytorch_model.fp16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26dc7594ac2d3b72f656d846dadec4a08395273f4c1a52a7966b979ab29063ca
3
+ size 5135669022
unet/diffusion_pytorch_model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:933778ce76c1fc0ca918b37e1488411b8a99bbd3279c12f527a3ac995a340864
3
+ size 5135149760
unet/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11b6d7bce65674659cc6b7ea960658436edfd80e566cb240ebd4bfbc3e2076c8
3
+ size 10270077736
vae/config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.27.0.dev0",
4
+ "act_fn": "silu",
5
+ "block_out_channels": [
6
+ 128,
7
+ 256,
8
+ 512,
9
+ 512
10
+ ],
11
+ "down_block_types": [
12
+ "DownEncoderBlock2D",
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D"
16
+ ],
17
+ "force_upcast": true,
18
+ "in_channels": 3,
19
+ "latent_channels": 4,
20
+ "layers_per_block": 2,
21
+ "norm_num_groups": 32,
22
+ "out_channels": 3,
23
+ "sample_size": 1024,
24
+ "up_block_types": [
25
+ "UpDecoderBlock2D",
26
+ "UpDecoderBlock2D",
27
+ "UpDecoderBlock2D",
28
+ "UpDecoderBlock2D"
29
+ ],
30
+ "latents_mean": [
31
+ -1.6574,
32
+ 1.886,
33
+ -1.383,
34
+ 2.5155
35
+ ],
36
+ "latents_std": [
37
+ 8.4927,
38
+ 5.9022,
39
+ 6.5498,
40
+ 5.2299
41
+ ],
42
+ "scaling_factor": 0.5
43
+ }
vae/diffusion_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9512825399e39027a15fd0c7360dd0fb762d6faf87558d94fcf94e041b53e9f9
3
+ size 334712578
vae/diffusion_pytorch_model.fp16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:814c655ef8cd535c57ae1bef01ecb06526839fede56b8dc6501c02525917fd20
3
+ size 167404866
vae/diffusion_pytorch_model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bcb60880a46b63dea58e9bc591abe15f8350bde47b405f9c38f4be70c6161e68
3
+ size 167335342
vae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8aef7b00195ec3fa8caaa3434e7516eff7d658e1d30eafc9ad6b0e66e9e827e
3
+ size 334643268