Linaqruf commited on
Commit
dcd9ed1
1 Parent(s): 9336a6c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -79
README.md CHANGED
@@ -11,13 +11,13 @@ tags:
11
  - stable-diffusion-xl
12
  base_model: cagliostrolab/animagine-xl-3.0
13
  widget:
14
- - text: 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck, masterpiece, best quality
15
  parameter:
16
- negative_prompt: nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name
17
  example_title: 1girl
18
- - text: 1boy, male focus, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck, masterpiece, best quality
19
  parameter:
20
- negative_prompt: nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name
21
  example_title: 1boy
22
  ---
23
  <style>
@@ -197,7 +197,6 @@ In addition to special tags, we would like to introduce aesthetic tags based on
197
  ## Anime-focused Dataset Additions
198
  On Animagine XL 3.0, we mostly added characters from popular gacha games. Based on users’ feedbacks, we are adding plenty of popular anime franchises into our dataset for this model. We will release the full list of the characters that might be generated by this iteration to our HuggingFace soon, be sure to check it out when it’s up!
199
 
200
-
201
  ## Model Details
202
  - **Developed by**: [Cagliostro Research Lab](https://huggingface.co/cagliostrolab)
203
  - **Model type**: Diffusion-based text-to-image generative model
@@ -217,36 +216,24 @@ Animagine XL 3.1 is accessible through user-friendly platforms such as Gradio an
217
  To use Animagine XL 3.1, install the required libraries as follows:
218
 
219
  ```bash
220
- pip install diffusers --upgrade
221
- pip install transformers accelerate safetensors
222
  ```
223
 
224
  Example script for generating images with Animagine XL 3.1:
225
 
226
  ```python
227
  import torch
228
- from diffusers import (
229
- StableDiffusionXLPipeline,
230
- EulerAncestralDiscreteScheduler,
231
- AutoencoderKL
232
- )
233
- # Load VAE component
234
- vae = AutoencoderKL.from_pretrained(
235
- "madebyollin/sdxl-vae-fp16-fix",
236
- torch_dtype=torch.float16
237
- )
238
- # Configure the pipeline
239
- pipe = StableDiffusionXLPipeline.from_pretrained(
240
  "cagliostrolab/animagine-xl-3.1",
241
- vae=vae,
242
  torch_dtype=torch.float16,
243
  use_safetensors=True,
244
  )
245
- pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
246
  pipe.to('cuda')
247
- # Define prompts and generate image
248
- prompt = "1girl, arima kana, oshi no ko, solo, upper body, v, smile, looking at viewer, outdoors, night"
249
- negative_prompt = "nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name"
250
  image = pipe(
251
  prompt,
252
  negative_prompt=negative_prompt,
@@ -255,13 +242,15 @@ image = pipe(
255
  guidance_scale=7,
256
  num_inference_steps=28
257
  ).images[0]
 
 
258
  ```
259
 
260
  ## Usage Guidelines
261
 
262
  ### Tag Ordering
263
 
264
- Prompting is a bit different in this iteration, for optimal results, it's recommended to follow the structured prompt template because we train the model like this:
265
 
266
  ```
267
  1girl/1boy, character name, from what series, everything else in any order.
@@ -273,63 +262,67 @@ Like the previous iteration, this model was trained with some special tags to st
273
 
274
  ### Quality Modifiers
275
 
276
- | Quality Modifier | Score Criterion |
277
- | ---------------- | --------------- |
278
- | `masterpiece` | >150 |
279
- | `best quality` | 100-150 |
280
- | `high quality` | 75-100 |
281
- | `medium quality` | 25-75 |
282
- | `normal quality` | 0-25 |
283
- | `low quality` | -5-0 |
284
- | `worst quality` | <-5 |
 
 
285
 
286
  ### Rating Modifiers
287
 
288
- | Rating Modifier | Rating Criterion |
289
- | ------------------------------| ------------------------- |
290
- | `rating: general` | General |
291
- | `rating: sensitive` | Sensitive |
292
- | `rating: questionable`, `nsfw`| Questionable |
293
- | `rating: explicit`, `nsfw` | Explicit |
 
 
294
 
295
  ### Year Modifier
296
 
297
- These tags help to steer the result toward modern or vintage anime art styles, ranging from `newest` to `oldest`.
298
 
299
  | Year Tag | Year Range |
300
- | -------- | ---------------- |
301
- | `newest` | 2022 to 2023 |
302
- | `late` | 2019 to 2021 |
303
- | `mid` | 2015 to 2018 |
304
  | `early` | 2011 to 2014 |
305
  | `oldest` | 2005 to 2010 |
306
 
307
  ### Aesthetic Tags
308
 
309
- This tag, combined with quality tag, can be used to guide the model to generate better results.
310
 
311
- | Aesthetic Tags |
312
- |--------------------|
313
- | `very aesthetic` |
314
- | `aesthetic` |
315
- | `displeasing` |
316
- | `very displeasing` |
317
 
318
  ## Recommended settings
319
 
320
  To guide the model towards generating high-aesthetic images, use negative prompts like:
321
 
322
  ```
323
- nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name
324
  ```
325
 
326
  For higher quality outcomes, prepend prompts with:
327
 
328
  ```
329
- masterpiece, best quality
330
  ```
331
 
332
- However, be careful to use `masterpiece`, `best quality` because many high-scored datasets are NSFW. It’s better to add `nsfw`, `rating: sensitive` to the negative prompt and `rating: general` to the positive prompt. it’s recommended to use a lower classifier-free guidance (CFG Scale) of around 5-7, sampling steps below 30, and to use Euler Ancestral (Euler a) as a sampler.
333
 
334
  ### Multi Aspect Resolution
335
 
@@ -349,37 +342,38 @@ This model supports generating images at the following dimensions:
349
 
350
  ## Training and Hyperparameters
351
 
352
- - **Animagine XL 3.1** was trained on a 2x A100 GPU with 80GB memory for 31 days or over 500 gpu hours. The training process encompassed three stages:
353
- - Base:
354
- - **Feature Alignment Stage**: Utilized 1.2m images to acquaint the model with basic anime concepts.
355
- - **Refining UNet Stage**: Employed 2.5k curated datasets to only fine-tune the UNet.
356
- - Curated:
357
- - **Aesthetic Tuning Stage**: Employed 3.5k high-quality curated datasets to refine the model's art style.
358
 
359
  ### Hyperparameters
360
 
361
- | Stage | Epochs | UNet Learning Rate | Train Text Encoder | Text Encoder Learning Rate | Batch Size | Mixed Precision | Noise Offset |
362
- |-----------------------------|--------|--------------------|--------------------|----------------------------|----------------|-----------------|--------------|
363
- | **Feature Alignment Stage** | 10 | 7.5e-6 | True | 3.75e-6 | 48 x 2 | fp16 | N/A |
364
- | **Refining UNet Stage** | 10 | 2e-6 | False | N/A | 48 | fp16 | 0.0357 |
365
- | **Aesthetic Tuning Stage** | 10 | 1e-6 | False | N/A | 48 | fp16 | 0.0357 |
366
 
367
- ## Model Comparison
368
 
369
  ### Training Config
370
 
371
- | Configuration Item | Animagine XL 2.0 | Animagine 3.0 | Animagine 3.1 |
372
- |-----------------------|-------------------------|-------------------------|-------------------------|
373
- | **GPU** | A100 80G | 2 x A100 80G | 2 x A100 80G |
374
- | **Dataset** | 170k + 83k images | 1271990 + 3500 Images | 1271990 + 3500 Images |
375
- | **Shuffle Separator** | N/A | True | True |
376
- | **Global Epochs** | 20 | 20 | 20 |
377
- | **Learning Rate** | 1e-6 | 7.5e-6 | 7.5e-6 |
378
- | **Batch Size** | 32 | 48 x 2 | 48 x 2 |
379
- | **Train Text Encoder**| True | True | True |
380
- | **Train Special Tags**| True | True | True |
381
- | **Image Resolution** | 1024 | 1024 | 1024 |
382
- | **Bucket Resolution** | 2048 x 512 | 2048 x 512 | 2048 x 512 |
 
383
 
384
  Source code and training config are available here: https://github.com/cagliostrolab/sd-scripts/tree/main/notebook
385
 
 
11
  - stable-diffusion-xl
12
  base_model: cagliostrolab/animagine-xl-3.0
13
  widget:
14
+ - text: 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck, masterpiece, best quality, very aesthetic, absurdes
15
  parameter:
16
+ negative_prompt: nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan, [abstract]
17
  example_title: 1girl
18
+ - text: 1boy, male focus, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck, masterpiece, best quality, very aesthetic, absurdes
19
  parameter:
20
+ negative_prompt: nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan, [abstract]
21
  example_title: 1boy
22
  ---
23
  <style>
 
197
  ## Anime-focused Dataset Additions
198
  On Animagine XL 3.0, we mostly added characters from popular gacha games. Based on users’ feedbacks, we are adding plenty of popular anime franchises into our dataset for this model. We will release the full list of the characters that might be generated by this iteration to our HuggingFace soon, be sure to check it out when it’s up!
199
 
 
200
  ## Model Details
201
  - **Developed by**: [Cagliostro Research Lab](https://huggingface.co/cagliostrolab)
202
  - **Model type**: Diffusion-based text-to-image generative model
 
216
  To use Animagine XL 3.1, install the required libraries as follows:
217
 
218
  ```bash
219
+ pip install diffusers transformers accelerate safetensors --upgrade
 
220
  ```
221
 
222
  Example script for generating images with Animagine XL 3.1:
223
 
224
  ```python
225
  import torch
226
+ from diffusers import DiffusionPipeline,
227
+
228
+ pipe = DiffusionPipeline.from_pretrained(
 
 
 
 
 
 
 
 
 
229
  "cagliostrolab/animagine-xl-3.1",
 
230
  torch_dtype=torch.float16,
231
  use_safetensors=True,
232
  )
 
233
  pipe.to('cuda')
234
+
235
+ prompt = "1girl, souryuu asuka langley, neon genesis evangelion, solo, upper body, v, smile, looking at viewer, outdoors, night"
236
+ negative_prompt = "nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan, [abstract]"
237
  image = pipe(
238
  prompt,
239
  negative_prompt=negative_prompt,
 
242
  guidance_scale=7,
243
  num_inference_steps=28
244
  ).images[0]
245
+
246
+ image.save("./asuka_test.png")
247
  ```
248
 
249
  ## Usage Guidelines
250
 
251
  ### Tag Ordering
252
 
253
+ For optimal results, it's recommended to follow the structured prompt template because we train the model like this:
254
 
255
  ```
256
  1girl/1boy, character name, from what series, everything else in any order.
 
262
 
263
  ### Quality Modifiers
264
 
265
+ Quality tags now consider both scores and post ratings to ensure a balanced quality distribution. We've refined labels for greater clarity, such as changing 'high quality' to 'great quality'.
266
+
267
+ | Quality Modifier | Score Criterion |
268
+ |------------------|-------------------|
269
+ | `masterpiece` | > 95% |
270
+ | `best quality` | > 85% & ≤ 95% |
271
+ | `great quality` | > 75% & ≤ 85% |
272
+ | `good quality` | > 50% & ≤ 75% |
273
+ | `normal quality` | > 25% & ≤ 50% |
274
+ | `low quality` | > 10% & ≤ 25% |
275
+ | `worst quality` | ≤ 10% |
276
 
277
  ### Rating Modifiers
278
 
279
+ We've also streamlined our rating tags for simplicity and clarity, aiming to establish global rules that can be applied across different models. For example, the tag 'rating: general' is now simply 'general', and 'rating: sensitive' has been condensed to 'sensitive'.
280
+
281
+ | Rating Modifier | Rating Criterion |
282
+ |-------------------|------------------|
283
+ | `general` | General |
284
+ | `sensitive` | Sensitive |
285
+ | `nsfw` | Questionable |
286
+ | `explicit, nsfw` | Explicit |
287
 
288
  ### Year Modifier
289
 
290
+ We've also redefined the year range to steer results towards specific modern or vintage anime art styles more accurately. This update simplifies the range, focusing on relevance to current and past eras.
291
 
292
  | Year Tag | Year Range |
293
+ |----------|------------------|
294
+ | `newest` | 2021 to 2024 |
295
+ | `recent` | 2018 to 2020 |
296
+ | `mid` | 2015 to 2017 |
297
  | `early` | 2011 to 2014 |
298
  | `oldest` | 2005 to 2010 |
299
 
300
  ### Aesthetic Tags
301
 
302
+ We've enhanced our tagging system with aesthetic tags to refine content categorization based on visual appeal. These tags—`very aesthetic`, `aesthetic`, `displeasing`, and `very displeasing`—are derived from evaluations made by a specialized ViT (Vision Transformer) image classification model, specifically trained on anime data. For this purpose, we utilized the model [shadowlilac/aesthetic-shadow-v2](https://huggingface.co/shadowlilac/aesthetic-shadow-v2), which assesses the aesthetic value of content before it undergoes training. This ensures that each piece of content is not only relevant and accurate but also visually appealing.
303
 
304
+ | Aesthetic Tag | Score Range |
305
+ |-------------------|-------------------|
306
+ | `very aesthetic` | > 0.71 |
307
+ | `aesthetic` | > 0.45 & < 0.71 |
308
+ | `displeasing` | > 0.27 & < 0.45 |
309
+ | `very displeasing`| ≤ 0.27 |
310
 
311
  ## Recommended settings
312
 
313
  To guide the model towards generating high-aesthetic images, use negative prompts like:
314
 
315
  ```
316
+ nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan, [abstract]
317
  ```
318
 
319
  For higher quality outcomes, prepend prompts with:
320
 
321
  ```
322
+ masterpiece, best quality, very aesthetic, absurdres
323
  ```
324
 
325
+ it’s recommended to use a lower classifier-free guidance (CFG Scale) of around 5-7, sampling steps below 30, and to use Euler Ancestral (Euler a) as a sampler.
326
 
327
  ### Multi Aspect Resolution
328
 
 
342
 
343
  ## Training and Hyperparameters
344
 
345
+ - **Animagine XL 3.1** was trained on a 2x A100 GPU 80GB for roughly 15 days or over 350 gpu hours (pretraining stage). The training process encompassed three stages:
346
+ - Continual Pretraining:
347
+ - **Pretraining Stage**: Utilize data-rich collection of images, this consists of 870k million ordered, tagged images, to increase Animagine XL 3.0 model knowledge.
348
+ - Finetuning:
349
+ - **First Stage**: Utilize labeled and curated aesthetic datasets to refine broken U-Net after pretraining
350
+ - **Second Stage**: Utilize labeled and curated aesthetic datasets to refine the model's art style and fixing bad hands and anatomy
351
 
352
  ### Hyperparameters
353
 
354
+ | Stage | Epochs | UNet lr | Train Text Encoder | Batch Size | Noise Offset | Optimizer | LR Scheduler | Grad Acc Steps | GPUs |
355
+ |-----------------------|--------|---------|--------------------|------------|--------------|------------|-------------------------------|----------------|------|
356
+ | **Pretraining Stage** | 10 | 1e-5 | True | 16 | N/A | AdamW | Cosine Annealing Warm Restart | 3 | 2 |
357
+ | **First Stage** | 10 | 2e-6 | False | 48 | 0.0357 | Adafactor | Constant with Warmup | 1 | 1 |
358
+ | **Second Stage** | 15 | 1e-6 | False | 48 | 0.0357 | Adafactor | Constant with Warmup | 1 | 1 |
359
 
360
+ ## Model Comparison (Pretraining only)
361
 
362
  ### Training Config
363
 
364
+ | Configuration Item | Animagine XL 3.0 | Animagine XL 3.1 |
365
+ |---------------------------------|------------------------------------------|------------------------------------------------|
366
+ | **GPU** | 2 x A100 80G | 2 x A100 80G |
367
+ | **Dataset** | 1,271,990 | 873,504 |
368
+ | **Shuffle Separator** | True | True |
369
+ | **Num Epochs** | 10 | 10 |
370
+ | **Learning Rate** | 7.5e-6 | 1e-5 |
371
+ | **Text Encoder Learning Rate** | 3.75e-6 | 1e-5 |
372
+ | **Effective Batch Size** | 48 x 1 x 2 | 16 x 3 x 2 |
373
+ | **Optimizer** | Adafactor | AdamW |
374
+ | **Optimizer Args** | Scale Parameter: False, Relative Step: False, Warmup Init: False | Weight Decay: 0.1, Betas: (0.9, 0.99) |
375
+ | **LR Scheduler** | Constant with Warmup | Cosine Annealing Warm Restart |
376
+ | **LR Scheduler Args** | Warmup Steps: 100 | Num Cycles: 10, Min LR: 1e-6, LR Decay: 0.9, First Cycle Steps: 9,099 |
377
 
378
  Source code and training config are available here: https://github.com/cagliostrolab/sd-scripts/tree/main/notebook
379