atatakun commited on
Commit
99f296b
1 Parent(s): ac8ade1

Upload 7 files

Browse files
.gitattributes CHANGED
@@ -32,3 +32,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
+ example-images/1boy.png filter=lfs diff=lfs merge=lfs -text
36
+ example-images/1girl.png filter=lfs diff=lfs merge=lfs -text
37
+ example-images/aesthetic.png filter=lfs diff=lfs merge=lfs -text
38
+ example-images/thumbnail.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,111 @@
1
  ---
2
  license: creativeml-openrail-m
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: creativeml-openrail-m
3
+ thumbnail: "https://huggingface.co/coreml/coreml-anything-v3-1/resolve/main/example-images/thumbnail.png"
4
+ language:
5
+ - en
6
+ tags:
7
+ - coreml
8
+ - stable-diffusion
9
+ - stable-diffusion-diffusers
10
  ---
11
+
12
+ # Core ML Converted Model
13
+
14
+ This model was converted to Core ML for use on Apple Silicon devices by following Apple's instructions [here](https://github.com/apple/ml-stable-diffusion#-converting-models-to-core-ml).<br>
15
+ Provide the model to an app such as [Mochi Diffusion](https://github.com/godly-devotion/MochiDiffusion) to generate images.<br>
16
+
17
+ `split_einsum` version is compatible with all compute unit options including Neural Engine.<br>
18
+ `original` version is only compatible with CPU & GPU option.
19
+
20
+ # Anything V3.1
21
+
22
+ ![Anime Girl](https://huggingface.co/coreml/coreml-anything-v3-1/resolve/main/example-images/thumbnail.png)
23
+
24
+ Anything V3.1 is a third-party continuation of a latent diffusion model, Anything V3.0. This model is claimed to be a better version of Anything V3.0 with a fixed VAE model and a fixed CLIP position id key. The CLIP reference was taken from Stable Diffusion V1.5. The VAE was swapped using Kohya's merge-vae script and the CLIP was fixed using Arena's stable-diffusion-model-toolkit webui extensions.
25
+
26
+ Anything V3.2 is supposed to be a resume training of Anything V3.1. The current model has been fine-tuned with a learning rate of 2.0e-6, 50 epochs, and 4 batch sizes on datasets collected from many sources, with 1/4 of them being synthetic datasets. The dataset has been preprocessed using the Aspect Ratio Bucketing Tool so that it can be converted to latents and trained at non-square resolutions. This model is supposed to be a test model to see how the clip fix affects training. Like other anime-style Stable Diffusion models, it also supports Danbooru tags to generate images.
27
+
28
+ e.g. **_1girl, white hair, golden eyes, beautiful eyes, detail, flower meadow, cumulonimbus clouds, lighting, detailed sky, garden_**
29
+
30
+ - Use it with the [`Automatic1111's Stable Diffusion Webui`](https://github.com/AUTOMATIC1111/stable-diffusion-webui) see: ['how-to-use'](#how-to-use)
31
+ - Use it with 🧨 [`diffusers`](##🧨Diffusers)
32
+
33
+
34
+ # Model Details
35
+
36
+ - **Currently maintained by:** Cagliostro Research Lab
37
+ - **Model type:** Diffusion-based text-to-image generation model
38
+ - **Model Description:** This is a model that can be used to generate and modify anime-themed images based on text prompts.
39
+ - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL)
40
+ - **Finetuned from model:** Anything V3.1
41
+
42
+ ## How-to-Use
43
+ - Download `Anything V3.1` [here](https://huggingface.co/cag/anything-v3-1/resolve/main/anything-v3-1.safetensors), or `Anything V3.2` [here](https://huggingface.co/cag/anything-v3-1/resolve/main/anything-v3-2.safetensors), all model are in `.safetensors` format.
44
+ - You need to adjust your prompt using aesthetic tags to get better result, you can use any generic negative prompt or use the following suggested negative prompt to guide the model towards high aesthetic generationse:
45
+ ```
46
+ lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry
47
+ ```
48
+ - And, the following should also be prepended to prompts to get high aesthetic results:
49
+ ```
50
+ masterpiece, best quality, illustration, beautiful detailed, finely detailed, dramatic light, intricate details
51
+ ```
52
+ ## 🧨Diffusers
53
+
54
+ This model can be used just like any other Stable Diffusion model. For more information, please have a look at the [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion). You can also export the model to [ONNX](https://huggingface.co/docs/diffusers/optimization/onnx), [MPS](https://huggingface.co/docs/diffusers/optimization/mps) and/or [FLAX/JAX](). Pretrained model currently based on Anything V3.1.
55
+
56
+ You should install dependencies below in order to running the pipeline
57
+
58
+ ```bash
59
+ pip install diffusers transformers accelerate scipy safetensors
60
+ ```
61
+ Running the pipeline (if you don't swap the scheduler it will run with the default DDIM, in this example we are swapping it to DPMSolverMultistepScheduler):
62
+
63
+ ```python
64
+ import torch
65
+ from torch import autocast
66
+ from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
67
+
68
+ model_id = "cag/anything-v3-1"
69
+
70
+ # Use the DPMSolverMultistepScheduler (DPM-Solver++) scheduler here instead
71
+ pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
72
+ pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
73
+ pipe = pipe.to("cuda")
74
+
75
+ prompt = "masterpiece, best quality, high quality, 1girl, solo, sitting, confident expression, long blonde hair, blue eyes, formal dress"
76
+ negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"
77
+
78
+ with autocast("cuda"):
79
+ image = pipe(prompt,
80
+ negative_prompt=negative_prompt,
81
+ width=512,
82
+ height=728,
83
+ guidance_scale=12,
84
+ num_inference_steps=50).images[0]
85
+
86
+ image.save("anime_girl.png")
87
+ ```
88
+ ## Limitation
89
+ This model is overfitted and cannot follow prompts well, even after the text encoder has been fixed. This leads to laziness in prompting, as you will only get good results by typing 1girl. Additionally, this model is anime-based and biased towards anime female characters. It is difficult to generate masculine male characters without providing specific prompts. Furthermore, not much has changed compared to the Anything V3.0 base model, as it only involved swapping the VAE and CLIP models and then fine-tuning for 50 epochs with small scale datasets.
90
+
91
+ ## Example
92
+
93
+ Here is some cherrypicked samples and comparison between available models
94
+
95
+ ![Anime Girl](https://huggingface.co/coreml/coreml-anything-v3-1/resolve/main/example-images/1girl.png)
96
+ ![Anime Boy](https://huggingface.co/coreml/coreml-anything-v3-1/resolve/main/example-images/1boy.png)
97
+ ![Aesthetic](https://huggingface.co/coreml/coreml-anything-v3-1/resolve/main/example-images/aesthetic.png)
98
+
99
+ ## License
100
+
101
+ This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage.
102
+ The CreativeML OpenRAIL License specifies:
103
+
104
+ 1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content
105
+ 2. The authors claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license
106
+ 3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully)
107
+ [Please read the full license here](https://huggingface.co/spaces/CompVis/stable-diffusion-license)
108
+
109
+
110
+ ## Credit
111
+ Public domain.
example-images/1boy.png ADDED

Git LFS Details

  • SHA256: 083f4d6cd56f25ca3d405873084f2273b127fb9c4b2eea3a393a27c5171435f3
  • Pointer size: 132 Bytes
  • Size of remote file: 1.75 MB
example-images/1girl.png ADDED

Git LFS Details

  • SHA256: c81fea8fb9796e8e1028234df51c9d49ab436779c28c1274f450dec2b2d61833
  • Pointer size: 132 Bytes
  • Size of remote file: 1.93 MB
example-images/aesthetic.png ADDED

Git LFS Details

  • SHA256: 9a2f2acef21c827d04b3cd133adbf939fd8c63480e31cc3f37cfbfc4062f3726
  • Pointer size: 132 Bytes
  • Size of remote file: 3.8 MB
example-images/thumbnail.png ADDED

Git LFS Details

  • SHA256: e5545c64dc69d0fd70bf71d91de6507cdb4aa7fd958fe2ff786053a941b0f8b5
  • Pointer size: 132 Bytes
  • Size of remote file: 1.21 MB
original/anything-v3-1_no-i2i_original.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1369538f15d98a47ddd803a09ff70343412226afed13bcbd206c5804e3d5d142
3
+ size 1908710471
original/anything-v3-2_no-i2i_original.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce50cbb46887bf913c37b91307132fa11c8d0e4a4a2964da97e0982cd64f9b19
3
+ size 1908709477