AstolfoMix / README.md
6DammK9's picture
Update README.md
0711c7a verified
metadata
language:
  - en
license: creativeml-openrail-m
tags:
  - stable-diffusion
  - stable-diffusion-diffusers
  - text-to-image
  - safetensors
inference: true
thumbnail: >-
  https://huggingface.co/6DammK9/AstolfoMix/resolve/main/231267-1921923808-3584-1536-4.5-256-20231207033217.jpg
widget:
  - text: aesthetic, quality, 1girl, boy, astolfo
example_title: example 1girl boy
library_name: diffusers

AstolfoMix (Baseline / Extended / Reinforced / 21b)

"21b"

231267-1921923808-3584-1536-4.5-256-20231207033217.jpg

parameters

(aesthetic:0), (quality:0), (solo:0.98), (boy:0), (wide_shot:0), [astolfo], [[[[astrophotography]]]]
Negative prompt: (worst:0), (low:0), (bad:0), (exceptional:0), (masterpiece:0), (comic:0), (extra:0), (lowres:0), (breasts:0.5)
Steps: 256, Sampler: Euler, CFG scale: 4.5, Seed: 1921923808, Size: 1792x768, Model hash: 28adb7ba78, Model: 21b-AstolfoMix-2020b, VAE hash: 551eac7037, VAE: vae-ft-mse-840000-ema-pruned.ckpt, Denoising strength: 0.7, Clip skip: 2, FreeU Stages: "[{\"backbone_factor\": 1.2, \"skip_factor\": 0.9}, {\"backbone_factor\": 1.4, \"skip_factor\": 0.2}]", FreeU Schedule: "0.0, 1.0, 0.0", FreeU Version: 2, Hires upscale: 2, Hires steps: 64, Hires upscaler: Latent, Dynamic thresholding enabled: True, Mimic scale: 1, Separate Feature Channels: False, Scaling Startpoint: MEAN, Variability Measure: AD, Interpolate Phi: 0.5, Threshold percentile: 100, Version: v1.6.1
  • Current version: 21b-AstolfoMix-2020b.safetensors (merge of 20 + 1 models)
  • Recommended version: "21b"
  • Recommended CFG: 4.0

Reinforced

  • Using AutoMBW (bayesian merger but less powerful) for the same set of 20 models.
  • BayesianOptimizer with ImageReward.

231342-142097205-2560-1440-4-256-20231127224612.jpg

parameters

(aesthetic:0), (quality:0), (race queen:0.98), [[braid]], [astolfo], [[[[nascar, nurburgring]]]]
Negative prompt: (worst:0), (low:0), (bad:0), (exceptional:0), (masterpiece:0), (comic:0), (extra:0), (lowres:0), (breasts:0.5)
Steps: 256, Sampler: Euler, CFG scale: 4, Seed: 142097205, Size: 1024x576, Model hash: aab8357cdc, Model: 20b-AstolfoMix-18b19b, VAE hash: 551eac7037, VAE: vae-ft-mse-840000-ema-pruned.ckpt, Denoising strength: 0.7, Clip skip: 2, FreeU Stages: "[{\"backbone_factor\": 1.2, \"skip_factor\": 0.9}, {\"backbone_factor\": 1.4, \"skip_factor\": 0.2}]", FreeU Schedule: "0.0, 1.0, 0.0", FreeU Version: 2, Hires upscale: 2.5, Hires steps: 64, Hires upscaler: Latent, Dynamic thresholding enabled: True, Mimic scale: 1, Separate Feature Channels: False, Scaling Startpoint: MEAN, Variability Measure: AD, Interpolate Phi: 0.5, Threshold percentile: 100, Version: v1.6.0-2-g4afaaf8a
  • Current version: 20b-AstolfoMix-18b19b (merge of 20 models)
  • Recommended version: "20b"
  • Recommended CFG: 4.0

Extended

  • Is 10 model ensemble robust enough? How about 20, with 10 more radical models?
  • For EMB / LoRAs, best fit will be models trained from NAI. Just use them, most of them will work.

231111-341693176-2688-1536-4-256-20231021050214.jpg

parameters
(aesthetic:0), (quality:0), (1girl:0), (boy:0), [[shirt]], [[midriff]], [[braid]], [astolfo], [[[[sydney opera house]]]]
Negative prompt: (worst:0), (low:0), (bad:0), (exceptional:0), (masterpiece:0), (comic:0), (extra:0), (lowres:0), (breasts:0.5)
Steps: 256, Sampler: Euler, CFG scale: 4, Seed: 341693176, Size: 1344x768, Model hash: 41429fdee1, Model: 20-bpcga9-lracrc2oh-b11i75pvc-gf34ym34-sd, VAE hash: 551eac7037, VAE: vae-ft-mse-840000-ema-pruned.ckpt, Denoising strength: 0.7, Clip skip: 2, FreeU Stages: "[{\"backbone_factor\": 1.2, \"skip_factor\": 0.9}, {\"backbone_factor\": 1.4, \"skip_factor\": 0.2}]", FreeU Schedule: "0.0, 1.0, 0.0", Hires upscale: 2, Hires steps: 64, Hires upscaler: Latent, Dynamic thresholding enabled: True, Mimic scale: 1, Separate Feature Channels: False, Scaling Startpoint: MEAN, Variability Measure: AD, Interpolate Phi: 0.7, Threshold percentile: 100, Version: v1.6.0
  • Current version: 20-bpcga9-lracrc2oh-b11i75pvc-gf34ym34-sd.safetensors (merge of 20 models)
  • Recommended version: "20"
  • Recommended CFG: 4.5 4.0

Baseline

  • A (baseline) merge model focusing on absurdres, and let me wait for a big anime SDXL finetune.
  • Behind the "absurdres", the model should be very robust and capable for most LoRAs / embeddings / addons you can imagine.
  • The image below is 2688x1536 without upscaler. With upscaler, it reaches 8K already.
  • The image below is 10752x6143, and it is a 3.25MB JPEG. "upscaler 4x". See PNG info below. Removed because some it failed to preview on some browsers.

230958-132385090-2688-1536-4.5-256-20230930203540.jpg

parameters
(aesthetic:0), (quality:0), (solo:0), (boy:0), (ushanka:0.98), [[braid]], [astolfo], [[moscow, russia]] 
Negative prompt: (worst:0), (low:0), (bad:0), (exceptional:0), (masterpiece:0), (comic:0), (extra:0), (lowres:0), (breasts:0.5) 
Steps: 256, Sampler: Euler, CFG scale: 4.5, Seed: 132385090, Size: 1344x768, Model hash: 6ffdb39acd, Model: 10-vcbpmtd8_cwlbdaw_eb5ms29-sd, VAE hash: 551eac7037, VAE: vae-ft-mse-840000-ema-pruned.ckpt, Denoising strength: 0.7, Clip skip: 2, FreeU Stages: "[{\"backbone_factor\": 1.2, \"skip_factor\": 0.9}, {\"backbone_factor\": 1.4, \"skip_factor\": 0.2}]", FreeU Schedule: "0.0, 1.0, 0.0", Hires upscale: 2, Hires steps: 64, Hires upscaler: Latent, Dynamic thresholding enabled: True, Mimic scale: 1, Separate Feature Channels: False, Scaling Startpoint: MEAN, Variability Measure: AD, Interpolate Phi: 0.7, Threshold percentile: 100, Version: v1.6.0
  • Current version: 10-vcbpmtd8_cwlbdaw_eb5ms29-sd.safetensors (merge of 10 models)
  • Recommended version: "06a" or "10"
  • Receipe Models: Merging UNETs into SD V1.4
  • "Roadmap" / "Theory" in my Github.
  • Recommended prompt: "SD 1.4's Text Encoder"
  • Recommended resolution: 1024x1024 (native T2I), HiRes 1.75x (RTX 2080Ti 11GB)
  • It can generate images up to 1280x1280 with HiRes 2.0x (Tesla M40 24GB), but the yield will be very low and time consuming to generate a nice image.
  • Recommended CFG: 4.5 (also tested on all base models), 6.0 (1280 mode)

Receipe

  • Full receipe.

  • Uniform merge. M = 1 / "number of models in total".

Extra: Comparing with merges with original Text Encoders

  • Uniform merge. M = 1 / "number of models in total".
  • Suprisingly, they looks similar, with only minor difference in background and unnamed details (semantic relationships).

xyz_grid-0181-3972813705-25600-2067-4.5-48-20230929010338.jpg

xyz_grid-0182-3972813705-25600-2069-4.5-48-20230929185331.jpg

xyz_grid-0183-3972813705-25600-2067-4.5-48-20230929231817.jpg

xyz_grid-0184-3972813705-25600-2067-4.5-48-20230929235846.jpg

xyz_grid-0328-3972813705-25600-2069-4-48-20231021190402.jpg

xyz_grid-0329-3972813705-25600-2069-4-48-20231021192917.jpg

xyz_grid-0330-3972813705-25600-2069-4-48-20231021201454.jpg

xyz_grid-0331-3972813705-25600-2069-4-48-20231021233059.jpg

License

This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage. The CreativeML OpenRAIL License specifies:

  1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content
  2. The authors claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license
  3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully) Please read the full license here