Ketengan-Diffusion commited on
Commit
f8f5ed0
1 Parent(s): ee44255

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -0
README.md ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - stable-cascade
6
+ - SDXL
7
+ - art
8
+ - artstyle
9
+ - fantasy
10
+ - anime
11
+ - aiart
12
+ - ketengan
13
+ - SomniumSC
14
+ pipeline_tag: text-to-image
15
+ library_name: diffusers
16
+ ---
17
+
18
+ # SomniumSC-v1 Model Showcase
19
+ <p align="center">
20
+ <img src="01.png" width=70% height=70%>
21
+ </p>
22
+
23
+ `Ketengan-Diffusion/SomniumSC-v1` is a fine tuned stage C Stable Cascade model [stabilityai/stable-cascade](https://huggingface.co/stabilityai/stable-cascade).
24
+
25
+ A fine-tuned model from all new stabilityAI model, Stable Cascade (Or we could say Würstchen v3) with a 2D (cartoonish) style is trained at Stage C 3.6B model. This model also trains the text encoder to generate a 2D style, so this model not only could generate using booru tag prompt, also you can use the natural language.
26
+
27
+ The model uses same amount and method of AnySomniumXL v2 used which has 33,000+ curated images from hundreds of thousands of images from various sources. The dataset is built by saving images that have an aesthetic score of at least 19 and a maximum of 50 (to maintain the cartoonish model and not too realistic. The scale is based on our proprietary aesthetic scoring mechanism), and do not have text and watermarks such as signatures or comic/manga images. Thus, images that have an aesthetic score of less than 17 and more than 50 will be discarded, as well as images that have watermarks or text will be discarded.
28
+
29
+ # Training Process
30
+
31
+ SomniumSC v1 Technical Specifications:
32
+
33
+ Training per 1 Epoch 30 Epoch (Results from SomniumSC using Epoch 30)
34
+
35
+ Captioned by proprietary multimodal LLM, better than LLaVA
36
+
37
+ Trained with a bucket size of 1024x1024
38
+
39
+ Shuffle Caption: Yes
40
+
41
+ Clip Skip: 0
42
+
43
+ Trained with 1x NVIDIA A100 80GB
44
+
45
+
46
+ # Our Dataset Process Curation
47
+ <p align="center">
48
+ <img src="Curation.png" width=70% height=70%>
49
+ </p>
50
+
51
+ Image source: [Source1](https://danbooru.donmai.us/posts/3143351) [Source2](https://danbooru.donmai.us/posts/3272710) [Source3](https://danbooru.donmai.us/posts/3320417)
52
+
53
+ Our dataset is scored using Pretrained CLIP+MLP Aesthetic Scoring model by https://github.com/christophschuhmann/improved-aesthetic-predictor, and We made adjusment into our script to detecting any text or watermark by utilizing OCR by pytesseract
54
+
55
+ This scoring method has scale between -1-100, we take the score threshold around 17 or 20 as minimum and 50-75 as maximum to pretain the 2D style of the dataset, Any images with text will returning -1 score. So any images with score below 17 or above 65 is deleted
56
+
57
+ The dataset curation proccess is using Nvidia T4 16GB Machine and takes about 7 days for curating 1.000.000 images.
58
+
59
+ # Captioning process
60
+ We using combination of proprietary Multimodal LLM and open source multimodal LLM such as LLaVa 1.5 as the captioning process which is resulting more complex result than using normal BLIP2. Any detail like the clothes, atmosphere, situation, scene, place, gender, skin, and others is generated by LLM.
61
+
62
+ # Tagging Process
63
+ We simply using booru tags, that retrieved from booru boards so this could be tagged by manually by human hence make this tags more accurate.
64
+
65
+ # Limitations:
66
+
67
+ ✓ Still requires broader dataset training for more variation of poses and style
68
+
69
+ ✓ Text cannot generated correctly, and seems ruined
70
+
71
+ ✓ This optimized for human or mutated human generation. Non human like SCP, Ponies, and more maybe could resulting not what you expecting
72
+
73
+ ✓ The faces maybe looks compressed. Generate the image at 1536px could be better
74
+
75
+ Smaller half size and stable cascade lite version will be released soon
76
+
77
+ # How to use SomniumSC:
78
+
79
+ Currently Stable Cascade only supported by ComfyUI.
80
+
81
+ Currently Stable Cascade only supported by ComfyUI.
82
+
83
+ You can use tutorial in [here](https://gist.github.com/comfyanonymous/0f09119a342d0dd825bb2d99d19b781c#file-stable_cascade_workflow_test-json) or [here](https://medium.com/@codeandbird/run-new-stable-cascade-model-in-comfyui-now-officially-supported-f66a37e9a8ad)
84
+
85
+ To simplify which model should you download, I will provide you the where's to download model directly
86
+
87
+ For stage A you can download from [Official stabilityai/stable-cascade repo](https://huggingface.co/stabilityai/stable-cascade).
88
+
89
+ For stage B you can download from [Official stabilityai/stable-cascade repo](https://huggingface.co/stabilityai/stable-cascade).
90
+
91
+ For stage C you can download the safetensors on huggingface repo that you find on files tab
92
+
93
+ And the text encoder you download from our huggingface repo on text_encoder folder
94
+
95
+
96
+ # SomniumSC Pro tips:
97
+
98
+ Negative prompt is a must to get better quality output. The recommended negative prompt is lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name
99
+
100
+ If the model producing pointy ears on the character, just add elf or pointy ears.
101
+
102
+ If the model producing "Compressed Face" use 1536px resolution, so the model can produce the face clearly.
103
+
104
+
105
+ # Disclaimer:
106
+
107
+ This model is under STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE. Which this model cannot be sold, and the derivative works cannot be commercialized. Except As far as I know, you can buy the membership of StabilityAI here To commercialize your derivative works based on this model. Please support StabilityAI, so they can always provide open source model for us. But still you can merge our model freely