Ketengan-Diffusion commited on
Commit
a653282
·
verified ·
1 Parent(s): 4cb2098

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md CHANGED
@@ -1,3 +1,78 @@
1
  ---
2
  license: creativeml-openrail-m
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: creativeml-openrail-m
3
+ language:
4
+ - en
5
+ tags:
6
+ - stable-diffusion
7
+ - SDXL
8
+ - art
9
+ - stable-diffusion-XL
10
+ - fantasy
11
+ - anime
12
+ - aiart
13
+ - ketengan
14
+ - AnySomniumXL
15
+ pipeline_tag: text-to-image
16
+ library_name: diffusers
17
  ---
18
+
19
+ # AnySomniumXL v3.5 Model Showcase
20
+ <p align="center">
21
+ <img src="01.png" width=70% height=70%>
22
+ </p>
23
+
24
+ `Ketengan-Diffusion/AnySomniumXL v3.5` is a SDXL model that has been fine-tuned on [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0).
25
+
26
+ This is enhanced version of AnySomniumXL v3
27
+
28
+ # Changelog over AnySomniumXL v3
29
+ * Better captioning process
30
+ * Better model generalizing
31
+ * Increased concept and character accuracy
32
+ * Better stylizing on untrained token
33
+
34
+ # Our Dataset Process Curation
35
+ Our dataset is scored using Pretrained CLIP+MLP Aesthetic Scoring model by https://github.com/christophschuhmann/improved-aesthetic-predictor, and We made adjusment into our script to detecting any text or watermark by utilizing OCR by pytesseract
36
+
37
+ This scoring method has scale between -1-100, we take the score threshold around 17 or 20 as minimum and 65-75 as maximum to pretain the 2D style of the dataset, Any images with text will returning -1 score. So any images with score below 17 or above 65 is deleted
38
+
39
+ The dataset curation proccess is using Nvidia T4 16GB Machine and takes about 2 days for curating 300.000 images.
40
+
41
+ # Captioning process
42
+ We using combination of proprietary Multimodal LLM and open source multimodal LLM such as LLaVa 1.5 as the captioning process which is resulting more complex result than using normal BLIP2. Any detail like the clothes, atmosphere, situation, scene, place, gender, skin, and others is generated by LLM.
43
+
44
+ This captioning process to captioning 133k images takes about 6 Days with NVIDIA Tesla A100 80GB PCIe. We still improving our script to generate caption faster. The minimum VRAM that required for this captioning process is 24GB VRAM which is not sufficient if we using NVIDIA Tesla T4 16GB
45
+
46
+ # Tagging Process
47
+ We simply using booru tags, that retrieved from booru boards so this could be tagged by manually by human hence make this tags more accurate.
48
+
49
+ # Official Demo
50
+ You can try our AnySomniumXL v3 for free on demo.ketengan.com
51
+
52
+ # Training Process
53
+
54
+ AnySomniumXL v3.5 Technical Specifications:
55
+
56
+ Batch Size: 25
57
+
58
+ Learning rate: 2e-6
59
+
60
+ Trained with a bucket size of 1280x1280
61
+
62
+ Shuffle Caption: Yes
63
+
64
+ Clip Skip: 2
65
+
66
+ Trained with 2x NVIDIA A100 80GB
67
+
68
+ # Recommended Resolution
69
+ Because it's trained with 1280x1280 resolution, so here the best resolution to get the full power of AnySomniumXL v3
70
+ * 1280x1280
71
+ * 1472x1088
72
+ * 1152x1408
73
+ * 1536x1024
74
+ * 1856x832
75
+ * 1024x1600
76
+
77
+ You can support me:
78
+ - on [Ko-FI](https://ko-fi.com/ncaix)