Koolchh commited on
Commit
08a4ba9
1 Parent(s): 5806669

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +124 -0
README.md CHANGED
@@ -1,3 +1,127 @@
1
  ---
2
  license: openrail++
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: openrail++
3
+ tags:
4
+ - text-to-image
5
+ - stable-diffusion
6
+ - diffusers
7
  ---
8
+
9
+ # AnimeBoysXL v2.0
10
+
11
+ **It takes substantial time and efforts to bake models. If you appreciate my models, I would be grateful if you could support me on [Ko-fi](https://ko-fi.com/koolchh) ☕.**
12
+
13
+ ## Features
14
+
15
+ - ✔️ **Good for inference**: AnimeBoysXL v2.0 is a flexible model which is good at generating images of anime boys and males-only content in a wide range of styles.
16
+ - ✔️ **Good for training**: AnimeBoysXL v2.0 is suitable for further training, thanks to its neutral style and ability to recognize a great deal of concepts. Feel free to train your own anime boy model/LoRA from AnimeBoysXL.
17
+ - ❌ AnimeBoysXL v2.0 is not optimized for creating anime girls. Please consider using other models for that purpose.
18
+
19
+ ## Inference Guide
20
+
21
+ - **Prompt**: Use tag-based prompts to describe your subject.
22
+ - Tag ordering matters. It is highly recommended to structure your prompt with the following templates:
23
+ ```
24
+ 1boy, male focus, character name, series name, anything else you'd like to describe
25
+ ```
26
+ ```
27
+ 2boys, male focus, multiple boys, character name(s), series name, anything else you'd like to describe
28
+ ```
29
+ - Append
30
+ ```
31
+ , best quality, amazing quality, best aesthetic, absurdres
32
+ ```
33
+ to the prompt to improve image quality.
34
+ - (*Optional*) Append
35
+ ```
36
+ , year YYYY
37
+ ```
38
+ to the prompt to shift the output toward the prevalent style of that year. `YYYY` is a 4 digit year, e.g. `, year 2023`
39
+ - **Negative prompt**: Choose from one of the following two presets.
40
+ 1. Heavy (*recommended*):
41
+ ```
42
+ lowres, (bad:1.05), text, error, missing, extra, fewer, cropped, jpeg artifacts, worst quality, bad quality, watermark, bad aesthetic, unfinished, chromatic aberration, scan, scan artifacts, 1girl, breasts
43
+ ```
44
+ 2. Light:
45
+ ```
46
+ lowres, jpeg artifacts, worst quality, watermark, blurry, bad aesthetic, 1girl, breasts
47
+ ```
48
+ - (*Optional*) Add
49
+ ```
50
+ , realistic, lips, nose
51
+ ```
52
+ to the negative prompt if you need a flat anime-like style face.
53
+ - **VAE**: Make sure you're using [SDXL VAE](https://huggingface.co/stabilityai/sdxl-vae/tree/main).
54
+ - **Sampling method, sampling steps and CFG scale**: I find **(Euler a, 28, 5)** good. You are encouraged to experiment with other settings.
55
+ - **Width and height**: **832*1216** for portrait, **1024*1024** for square, and **1216*832** for landscape.
56
+
57
+ ## 🧨Diffusers Example Usage
58
+
59
+ ```python
60
+ import torch
61
+ from diffusers import DiffusionPipeline
62
+
63
+ pipe = DiffusionPipeline.from_pretrained("Koolchh/AnimeBoysXL-v2.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
64
+ pipe.to("cuda")
65
+
66
+ prompt = "1boy, male focus, best quality, amazing quality, best aesthetic, absurdres"
67
+ negative_prompt = "lowres, (bad:1.05), text, error, missing, extra, fewer, cropped, jpeg artifacts, worst quality, bad quality, watermark, bad aesthetic, unfinished, chromatic aberration, scan, scan artifacts, 1girl, breasts"
68
+
69
+ image = pipe(
70
+ prompt=prompt,
71
+ negative_prompt=negative_prompt,
72
+ width=1024,
73
+ height=1024,
74
+ guidance_scale=5,
75
+ num_inference_steps=28
76
+ ).images[0]
77
+ ```
78
+
79
+ ## Training Details
80
+
81
+ AnimeBoysXL v2.0 is trained from [Stable Diffusion XL Base 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), on ~516k images.
82
+
83
+ The following tags are attached to the training data to make it easier to steer toward either more aesthetic or more flexible results.
84
+
85
+ ### Quality tags
86
+
87
+ | tag | score |
88
+ |-------------------|------------|
89
+ | `best quality` | >= 150 |
90
+ | `amazing quality` | [100, 150) |
91
+ | `great quality` | [75, 100) |
92
+ | `normal quality` | [0, 75) |
93
+ | `bad quality` | (-5, 0) |
94
+ | `worst quality` | <= -5 |
95
+
96
+ ### Aesthetic tags
97
+
98
+ | tag | score |
99
+ |--------------------|--------------|
100
+ | `best aesthetic` | >= 6.675 |
101
+ | `great aesthetic` | [6.0, 6.675) |
102
+ | `normal aesthetic` | [5.0, 6.0) |
103
+ | `bad aesthetic` | < 5.0 |
104
+
105
+ ### Rating tags
106
+
107
+ | tag | rating |
108
+ |-----------------|--------------|
109
+ | `sfw` | general |
110
+ | `slightly nsfw` | sensitive |
111
+ | `fairly nsfw` | questionable |
112
+ | `very nsfw` | explicit |
113
+
114
+ ### Year tags
115
+
116
+ `year YYYY` where `YYYY` is in the range of [2005, 2023].
117
+
118
+ ### Training configurations
119
+
120
+ - Hardware: 4 * Nvidia A100 80GB GPUs
121
+ - Optimizer: AdaFactor
122
+ - Gradient accumulation steps: 8
123
+ - Batch size: 4 * 8 * 4 = 128
124
+ - Learning rates:
125
+ - 8e-6 for U-Net
126
+ - 5.2e-6 for text encoder 1 (CLIP ViT-L)
127
+ - 4.8e-6 for text encoder 2 (OpenCLIP ViT-bigG)