metadata

license: openrail++
tags:
  - text-to-image
  - stable-diffusion
  - diffusers

AnimeBoysXL v2.0

It takes substantial time and efforts to bake models. If you appreciate my models, I would be grateful if you could support me on Ko-fi ☕.

Features

✔️ Good for inference: AnimeBoysXL v2.0 is a flexible model which is good at generating images of anime boys and males-only content in a wide range of styles.
✔️ Good for training: AnimeBoysXL v2.0 is suitable for further training, thanks to its neutral style and ability to recognize a great deal of concepts. Feel free to train your own anime boy model/LoRA from AnimeBoysXL.
❌ AnimeBoysXL v2.0 is not optimized for creating anime girls. Please consider using other models for that purpose.

Inference Guide

Prompt: Use tag-based prompts to describe your subject.
- Tag ordering matters. It is highly recommended to structure your prompt with the following templates:
```
1boy, male focus, character name, series name, anything else you'd like to describe
```
```
2boys, male focus, multiple boys, character name(s), series name, anything else you'd like to describe
```
- Append
```
, best quality, amazing quality, best aesthetic, absurdres
```
  to the prompt to improve image quality.
- (Optional) Append
```
, year YYYY
```
  to the prompt to shift the output toward the prevalent style of that year. YYYY is a 4 digit year, e.g. , year 2023

Negative prompt: Choose from one of the following two presets.

Heavy (recommended):

lowres, (bad:1.05), text, error, missing, extra, fewer, cropped, jpeg artifacts, worst quality, bad quality, watermark, bad aesthetic, unfinished, chromatic aberration, scan, scan artifacts, 1girl, breasts

Light:

lowres, jpeg artifacts, worst quality, watermark, blurry, bad aesthetic, 1girl, breasts

(Optional) Add
```
, realistic, lips, nose
```
to the negative prompt if you need a flat anime-like style face.

VAE: Make sure you're using SDXL VAE.
Sampling method, sampling steps and CFG scale: I find (Euler a, 28, 5) good. You are encouraged to experiment with other settings.
Width and height: 832*1216 for portrait, 1024*1024 for square, and 1216*832 for landscape.

🧨Diffusers Example Usage

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("Koolchh/AnimeBoysXL-v2.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda")

prompt = "1boy, male focus, best quality, amazing quality, best aesthetic, absurdres"
negative_prompt = "lowres, (bad:1.05), text, error, missing, extra, fewer, cropped, jpeg artifacts, worst quality, bad quality, watermark, bad aesthetic, unfinished, chromatic aberration, scan, scan artifacts, 1girl, breasts"

image = pipe(
    prompt=prompt, 
    negative_prompt=negative_prompt, 
    width=1024,
    height=1024,
    guidance_scale=5,
    num_inference_steps=28
).images[0]

Training Details

AnimeBoysXL v2.0 is trained from Stable Diffusion XL Base 1.0, on ~516k images.

The following tags are attached to the training data to make it easier to steer toward either more aesthetic or more flexible results.

Quality tags

tag	score
`best quality`	>= 150
`amazing quality`	[100, 150)
`great quality`	[75, 100)
`normal quality`	[0, 75)
`bad quality`	(-5, 0)
`worst quality`	<= -5

Aesthetic tags

tag	score
`best aesthetic`	>= 6.675
`great aesthetic`	[6.0, 6.675)
`normal aesthetic`	[5.0, 6.0)
`bad aesthetic`	< 5.0

Rating tags

tag	rating
`sfw`	general
`slightly nsfw`	sensitive
`fairly nsfw`	questionable
`very nsfw`	explicit

Year tags

year YYYY where YYYY is in the range of [2005, 2023].

Training configurations

Hardware: 4 * Nvidia A100 80GB GPUs
Optimizer: AdaFactor
Gradient accumulation steps: 8
Batch size: 4 * 8 * 4 = 128
Learning rates:
- 8e-6 for U-Net
- 5.2e-6 for text encoder 1 (CLIP ViT-L)
- 4.8e-6 for text encoder 2 (OpenCLIP ViT-bigG)