Eugeoter's picture
Update README.md
f5d3421 verified
|
raw
history blame
19.5 kB
---
license: other
license_name: faipl-1.0-sd
license_link: https://freedevproject.org/faipl-1.0-sd/
language:
- en
tags:
- text-to-image
- stable-diffusion
- safetensors
- stable-diffusion-xl
base_model: "stabilityai/stable-diffusion-xl-base-1.0"
---
<h1 align="center">ArtiWaifu Diffusion 1.0</h1>
We have released the **A**rti**Wa**ifu Diffusion V1.0 model, designed to generate aesthetically pleasing and faithfully restored anime-style illustrations.
The AWA Diffusion is an iteration of the Stable Diffusion XL model, mastering over 6000 artistic styles and more than 4000 anime characters, generating images through [trigger words](#trigger-words).
As a specialized image generation model for anime, it excels in producing high-quality anime images, especially in generating images with highly recognizable styles and characters while maintaining a consistently high-quality aesthetic expression.
## Model Details
The AWA Diffusion model is fine-tuned from Stable Diffusion XL, with a selected dataset of 1.5M high-quality anime images, covering a wide range of both popular and niche anime concepts up to April 15, 2024.
AWA Diffusion employs our most advanced training strategies, enabling users to easily induce the model to generate images of specific characters or styles while maintaining high image quality and aesthetic expression.
**Model Information**
- Developed by: [Euge](https://civitai.com/user/Euge_)
- Funded by: [Neta.art](https://nieta.art/)
- Model type: Generative text-to-image model
- Finetuned from model: [SDXL 1.0 Base](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
- License: [Fair AI Public License 1.0-SD](https://freedevproject.org/faipl-1.0-sd/)
## Usage Guide
This guide will (i) introduce the model's recommended usage methods and prompt writing strategies, aiming to provide suggestions for generation, and (ii) serve as a reference document for model usage, detailing the writing patterns and strategies for trigger words, quality tags, rating tags, style tags, and character tags.
### Basic Usage
- **CFG scale**: <span style="color:cyan">8-11</span> (higher than usual values)
- **Resolution**: Area (= width x height) around 1024x1024. Not lower than 256x256, and resolutions where both length and width are multiples of 32.
- **Sampling method**: Euler A (<span style="color:cyan">50+</span> steps) or DPM++ 2M Karras (<span style="color:cyan">~35</span> steps)
Due to the special training method, AWA's optimal inference step count and CFG scale are higher than regular values. As the inference steps increase, the quality of the generated images can continue to improve...
❓ **Question:** Why not use the standard SDXL resolution?
πŸ’‘ **Answer:** Because the bucketing algorithm used in training does not adhere to a fixed set of buckets. Although this does not conform to positional encoding, we have not observed any adverse effects.
### Prompting Strategies
All text-to-image diffusion models have a notoriously high sensitivity to prompt, and AWA Diffusion is no exception. Even a misspelling in the prompt, or even replacing spaces with underscores, can affect the generated results.
AWA Diffusion encourages users to write prompt in **tags** separated by **comma + space (`, `)**. Although the model also supports natural language descriptions as prompt, or an intermix of both, the tag-by-tag format is more stable and user-friendly.
When describing a specific ACG concept, such as a character, style, or scene, we recommend users choose tags from the [Danbooru tags](https://danbooru.donmai.us/tags) and replace underscores in the Danbooru tags with spaces to ensure the model accurately understands your needs. For example, `bishop_(chess)` should be written as `bishop (chess)`, and in inference tools like AUTOMATIC1111 WebUI that use parentheses to weight prompt, all parentheses within the tags should be escaped, i.e., `bishop \(chess\)`.
#### Tag Ordering
Including AWA Diffusion, most diffusion models better understand logically ordered tags. While tag ordering is not mandatory, it can help the model better understand your needs. Generally, the earlier the tag in the order, the greater its impact on generation.
Here's an example of tag ordering. The example organizes the order of tags, prepends [art style tags](#style-tags) and [character tags](#character-tags) because style and subject are the most important to the image. Subsequently, other tags are added in order of importance. Lastly, [aesthetic tags](#aesthetic-tags) and [quality tags](#quality-tags) are positioned at the end to further emphasize the aesthetics of the image.
art style (<span style="color:red">_by xxx_</span>) -> character (<span style="color:orange">_1 frieren (sousou no frieren)_</span>) -> race (elf) -> composition (cowboy shot) -> painting style (<span style="color:green">_impasto_</span>) -> theme (fantasy theme) -> main environment (in the forest, at day) -> background (gradient background) -> action (sitting on ground) -> expression (expressionless) -> main characteristics (white hair) -> other characteristics (twintails, green eyes, parted lip) -> clothing (wearing a white dress) -> clothing accessories (frills) -> other items (holding a magic wand) -> secondary environment (grass, sunshine) -> aesthetics (<span style="color:blue">_beautiful color_</span>, <span style="color:cyan">_detailed_</span>) -> quality (<span style="color:purple">_best_</span> quality) -> secondary description (birds, cloud, butterfly)
Tag order is not set in stone. Flexibility in writing prompt can yield better results. For example, if the effect of a concept (such as style) is too strong and detracts from the aesthetic appeal of the image, you can move it to a later position to reduce its impact.
#### Negative Prompt
Negative prompt are not necessary for AWA Diffusion. If you use negative prompt, it is not the case that the more negative prompt, the better. They should be **as concise as possible and easily recognizable by the model**. Too many negative words may lead to poorer generation results.
Here are some recommended scenarios for using negative prompt:
1. Watermark: `signature`, `logo`, `artist name`;
2. Quality: `worst quality`, `lowres`, `ugly`, `abstract`;
3. Style: `real life`, `3d`, `celluloid`, `sketch`, `draft`;
4. Human anatomy: `deformed hand`, `fused fingers`, `extra limbs`, `extra arms`, `missing arm`, `extra legs`, `missing leg`, `extra digits`, `fewer digits`.
### Trigger Words
Add trigger words to your prompts to inform the model about the concept you want to generate. Trigger words can include character names, artistic styles, scenes, actions, quality, etc.
**Tips for Trigger Word**
1. **Typos**: The model is very sensitive to the spelling of trigger words. Even a single letter difference can cause a trigger to fail or lead to unexpected results.
2. **Bracket Escaping**: Pay attention when using inference tools that rely on parentheses for weighting prompt, such as AUTOMATIC1111 WebUI, to escape parentheses in trigger words, e.g., `1 lucy (cyberpunk)` -> `1 lucy \(cyberpunk\)`.
3. **Triggering Effect Preview**:Through searching tags on [Danbooru](https://danbooru.donmai.us/tags) to preview the tag and better understand the tag's meaning and usage.
#### Style Tags
Style tags are divided into two types: <span style="color:red">Painting Style Tags</span> and <span style="color:blue">Artistic Style Tags</span>. <span style="color:red">Painting Style Tags</span> describe the painting techniques or media used in the image, such as oil painting, watercolor, flat color, and impasto. <span style="color:blue">Artistic Style Tags</span> represent the artistic style of the artist behind the image.
AWA Diffusion supports the following <span style="color:red">Painting Style Tags</span>:
- Painting style tags available in the Danbooru tags, such as `oil painting`, `watercolor`, `flat color`, etc.;
- All painting style tags supported by [AID XL 0.8](https://civitai.com/models/124189/anime-illust-diffusion-xl), such as `flat-pasto`, etc.;
- All style tags supported by [Neta Art XL 1.0](https://civitai.com/models/410737/neta-art-xl), such as `gufeng`, etc.;
See the [Painting Style Tags List](/references/style.csv) for full lists of painting style tags.
AWA Diffusion supports the following <span style="color:blue">Artistic Style Tags</span>:
- Artistic style tags available in the Danbooru tags, such as `by yoneyama mai`, `by wlop`, etc.;
- All artistic style tags supported by [AID XL 0.8](https://civitai.com/models/124189/anime-illust-diffusion-xl), such as `by antifreeze3`, `by 7thknights`, etc.;
See the [Artistic Style Tags List](/references/artist.csv) for full lists of artistic style tags.
The higher the tag count in the tag repository, the more thoroughly the artistic style has been trained, and the higher the fidelity in generation. Typically, artistic style tags with a count higher than **50** yield better generation results.
**Tips for Style Tag**
1. **Intensity Adjustment**: You can adjust the intensity of a style by altering the order or weighting of style tags in your prompt. Frontloading a style tag enhances its effect, while placing it later reduces its effect.
❓ **Question:** Why include the prefix `by` in artistic style tags?
πŸ’‘ **Answer:** To clearly inform the model that you want to generate a specific artistic style rather than something else, we recommend including the prefix `by` in artistic style tags. This differentiates `by xxx` from `xxx`, especially when `xxx` itself carries other meanings, such as `dino` which could represent either a dinosaur or an artist's identifier.
Similarly, when triggering characters, add a `1` as a prefix to the character trigger word.
#### Character Tags
Character tags describe the character IP in the generated image. Using character tags will guide the model to generate the **appearance features** of the character.
Character tags also need to be sourced from the [Character Tag List](/references/character.csv). To generate a specific character, first find the corresponding trigger word in the tag repository, replace all underscores `_` in the trigger word with spaces ` `, and prepend `1 ` to the character name.
For example, `1 ayanami rei` triggers the model to generate the character Rei Ayanami from the anime "EVA," corresponding to the Danbooru tag `ayanami_rei`; `1 asuna (sao)` triggers the model to generate the character Asuna from "Sword Art Online," corresponding to the Danbooru tag `asuna_(sao)`. [More examples](#examples)
The higher the tag count in the tag repository, the more thoroughly the character has been trained, and the higher the fidelity in generation. Typically, character tags with a count higher than **100** yield better generation results.
**Tips for Character Tag**
1. **Character Costuming**: To achieve more flexible character costuming, character tags do not deliberately guide the model to draw the official attire of the character. To generate a character in a specific official outfit, besides the trigger word, you should also include a description of the attire in the prompt, e.g., "1 lucy (cyberpunk), <span style="color:cyan">wearing a white cropped jacket, underneath bodysuit, shorts, thighhighs, hip vent</span>".
2. **Series Annotations**: Some character tags include additional parentheses annotations after the character name. The parentheses and the annotations within cannot be omitted, e.g., `1 lucy (cyberpunk)` cannot be written as `1 lucy`. Other than that, you don't need to add any additional annotations, for example, you DON'T need to add the series tag to which the character belongs after the character tag.
3. **Known Issue 1**: When generating certain characters, mysterious feature deformations may occur, e.g., `1 asui tsuyu` triggering the character Tsuyu Asui from "My Hero Academia" may result in an extra black line between the eyes. This is because the model incorrectly interprets the large round eyes as glasses, thus `glasses` should be included in the negative prompt to avoid this issue.
4. **Known Issue 2**: When generating less popular characters, AWA Diffusion might produce images with incomplete feature restoration due to insufficient data/training. In such cases, we recommend that you extend the character description in your prompt beyond just the character name, detailing the character's origin, race, hair color, attire, etc.
**Character Tag Trigger Examples**
| Trigger Word | Note |
| ------------------------------- | -------------------------------------------------------------- |
| 1 lucy (cyberpunk) | βœ… Correct character tag |
| 1 lucy | ❌ Missing bracket annotation |
| 1 lucy (cyber) | ❌ Incorrect bracket annotation |
| lucy (cyberpunk) | ❌ Missing prefix `1 ` |
| 1 lucy cyberpunk | ❌ Missing brackets |
| 1 lucy (cyberpunk | ❌ Bracket not closed |
| 1 lucky (cyberpunk) | ❌ Spelling error |
| 1 lucy (cyberpunk: edgerunners) | ❌ Bracket annotation not following the required character tag |
❓ **Question:** Why do some character tags contain bracket annotations, e.g., `lucy (cyberpunk)`, while others do not, e.g., `frieren`?
πŸ’‘ **Answer:** In different works, there may be characters with the same name, such as Asuna from "Sword Art Online" and "Azure Lane." To distinguish these characters with the same name, it is necessary to annotate the character's name with the work's name, abbreviated if the name is too long. For characters with unique names that currently have no duplicates, like `frieren`, no special annotations are required.
#### Quality Tags and Aesthetic Tags
For AWA Diffusion, including quality descriptors in your positive prompt is **very important**. Quality descriptions relate to quality tags and aesthetic tags.
Quality tags directly describe the aesthetic quality of the generated image, impacting the detail, texture, human anatomy, lighting, color, etc. Adding quality tags helps the model generate higher quality images. Quality tags are ranked from highest to lowest as follows:
<span style="color:orange">amazing quality</span> -> <span style="color:purple">best quality</span> -> <span style="color:blue">high quality</span> -> <span style="color:green">normal quality</span> -> <span style="color:black">low quality</span> -> <span style="color:grey">worst quality</span>
Aesthetic tags describe the aesthetic features of the generated image, aiding the model in producing artistically appealing images. In addition to typical aesthetic words like `perspective`, `lighting and shadow`, AWA Diffusion has been specially trained to respond effectively to aesthetic trigger words such as `beautiful color`, `detailed`, and `aesthetic`, which respectively express appealing colors, details, and overall beauty.
The recommended generic way to describe quality is: <Your Prompt>, <span style="color:orange">beautiful color, detailed, amazing quality</span>
**Tips for Quality and Aesthetic Tags**
1. **Tag Quantity**: Only one quality tag is needed; multiple aesthetic tags can be added.
2. **Tag Position**: The position of quality and aesthetic tags is not fixed, but they are typically placed at the end of the prompt.
3. **Relative Quality**: There is no absolute hierarchy of quality; the implied quality aligns with general aesthetic standards, and different users may have different perceptions of quality.
#### Rating Tags
Rating tags describe the level of exposure in the content of the generated image. Rating tags are ranked from highest to lowest as follows:
<span style="color:green">rating: general</span> (or <span style="color:green">safe</span>) -> <span style="color:yellow">rating: suggestive</span> -> <span style="color:orange">rating: questionable</span> -> <span style="color:red">rating: explicit</span> (or <span style="color:red">nsfw</span>)
### Prompt Word Examples
#### Example 1
**A**
<span style="color:green">by yoneyama mai</span>, <span style="color:blue">1 frieren</span>, 1girl, solo, fantasy theme, smile, holding a magic wand, <span style="color:yellow">beautiful color</span>, <span style="color:red">amazing quality</span>
1. <span style="color:green">by yoneyama mai</span> triggers the artistic style of Yoneyama Mai, placed at the front to enhance the effect.
2. <span style="color:blue">1 frieren</span> triggers the character Frieren from the series "Frieren at the Funeral."
3. <span style="color:yellow">beautiful color</span> describes the beautiful colors in the generated image.
4. <span style="color:red">amazing quality</span> describes the stunning quality of the generated image.
**B**
<span style="color:green">by nixeu</span>, <span style="color:blue">1 lucy (cyberpunk)</span>, 1girl, solo, cowboy shot, gradient background, white cropped jacket, underneath bodysuit, shorts, thighhighs, hip vent, <span style="color:yellow">detailed</span>, <span style="color:red">best quality</span>
#### Example 2: Style Mixing
By layering multiple different style tags, you can generate images with features of multiple styles.
**A** Simple Mixing
**<span style="color:green">by ningen mame</span>, <span style="color:cyan">by ciloranko</span>, <span style="color:blue">by sho (sho lwlw)</span>**, 1girl, 1 hatsune miku, sitting, arm support, smile, detailed, amazing quality
**B** Weighted Mixing
Using AUTOMATIC1111 WebUI prompt weighting syntax (parentheses weighting), weight different style tags to better control the generated image's style.
**<span style="color:green">(by ningen mame:0.8)</span>, <span style="color:cyan">(by ciloranko:1.1)</span>, <span style="color:blue">(by sho \(sho lwlw\):1.2)</span>**, 1girl, 1 hatsune miku, sitting, arm support, smile, detailed, amazing quality
#### Example 3: Multi-Character Scenes
By adding multiple character tags to your prompts, you can generate images with multiple characters in the same frame. Compared to other similar models, AWA performs better in multi-character scenes but remains unstable.
**A** Mixed Gender Scene
**1girl and 1boy, <span style="color:blue">1 ganyu</span> girl, <span style="color:cyan">1 gojou satoru</span> boy**, beautiful color, amazing quality
**B** Same Gender Scene
**2girls, <span style="color:blue">1 ganyu</span> girl, <span style="color:orange">1 yoimiya</span> girl**, beautiful color, amazing quality
## Future Work
AWA Diffusion is expected to combine high-level <span style="color:purple">aesthetics</span> with comprehensive <span style="color:cyan">knowledge</span>. It should neither have the traditional AI's greasy feel nor become a knowledge-deficient vase.
We will continue to explore more advanced training techniques and strategies, consistently improving the model's quality.
## Support Us
Training AWA Diffusion incurs substantial costs. If you appreciate our work, please consider supporting us through [Ko-fi](https://ko-fi.com/eugeai), to aid our research and development efforts. Thank you for your like and support!