license: other license_name: faipl-1.0-sd license_link: https://freedevproject.org/faipl-1.0-sd/

See complete release notes here:

English: https://nieta-art.feishu.cn/wiki/PpwqwVDzjiNE5kkUhRtcEsn6nmh

Chinese: https://nieta-art.feishu.cn/wiki/ARWLw99w7ikjaSkoV99cxmsOn5g

Civitai: https://civitai.com/models/410737/neta-art-xl

I. Overview

Introducing Neta Art XL V1.0, the easiest-to-use SDXL Anime model so far.
Keywords: Best Character Coverage, Vivid storytelling, Diverse styles, Stable anatomy.
Major motivation:
- Better stability and anatomy for character visual storytelling purpose:
  - Ordered prompt guide for model to easier follow prompts;
  - A very good balance between better knowledge and stability.
- Maintain a high ceiling standard for aesthetics across versatile anime art styles, while keeping the baseline of output appealing for general users.
- Less loras for characters / styles / artists, so we make better use of static model acceleration techniques.

Prompting Guide

To avoid possible ambiguity in text prompt, and leave room for very complicated scene such as multi-character, we found enforcing an ordering in prompts leads to better instruct-following behaviors (Learn from NAI3 / Animagine3 / AIDXL). Specifically, we use the following order in Neta Art XL:

Tag Order: subject (1boy / 1girl) -> character (a girl named frieren from sousou no frieren series) -> Artist trigger (by xxx) -> race (elf) -> composition (cowboy shot) -> style (impasto style) -> theme (fantasy theme) -> main environment (in the forest, at day) -> background (gradient background) -> action (sitting on ground) -> expression (is expressionless) -> main characteristics (white hair) -> body characteristics (twintails, green eyes, parted lip) -> clothing (wearing a white dress) -> clothing accessories (frills) -> other items (a cat) -> secondary environment (grass, sunshine) -> aesthetics (beautiful color, detailed, aesthetic) -> quality ((best quality:1.3))

More examples:

  "prompt": "1girl, solo, character: lucy, white background, cowboy shot, hip cutout, sad expressions, pantyhose, best quality, beautiful color, by bigroll"
},
{
  "prompt": "(detailed features, flat color:1.2), (lineart, flat-pasto:0.3),The red giant panda holds its bamboo and falls asleep on the water, best quality, "
},
{
  "prompt": "gufeng, 2.5d, impasto, old withered vines and ancient trees, a murder of crows descending into darkness, a small bridge over gently flowing water, quaint houses nestled by the river, best quality, masterpiece, highres, "
},
{
  "prompt": "aesthetic, detailed, beautiful color, amazing quality, best quality, high quality, cinematic quality, LUT, fine texture, crisp detail, by rella, by_dkxlek, by makoto shinkai, ayanami_rei, beautiful, parted lips, detailed face, detailed eyes, Large eyes, deep crimson eyes, introspective eyes, eyes designed by Ilya Kuvshinov style, (detailed long hair), blue hair, white hair, (holographic hair), multicolored hair, glow hair, transparent hair, floating hair, wind effect, sad expressions, lonely, melancholy atmosphere, subtle grace, elegance, ethereal beauty, transparent, clarity, from side, looking at viewer, off shoulder, bare shoulders, splashing collarbone, strapless, white dress, wet dress, standing in lake, Calm Lake, water, White Lily, petals, barefoot, ripples, Misty Forest, soft shadows, (clear blue sky), cloud, cloudy sky, starry sky, particle, feather, [moon], (high-resolution sci-fi landscape:1.3), post-human landscapes, thousands of years in the future, Earth, overgrown cities, reclaimed by nature, futuristic decay, ancient modern buildings, high-resolution ruins, (rust in ruins, collapsed building), [2.5d:celluloid:0.6], Depth of Field, sharp focus, bokeh,"
},
{
  "prompt": "(from side:1.0), (Red long hair:1.0), (Hair adorned with fine pearls:1.0), (sacred:1.0), (Blue light gauze dress:1.0), (🌊:1.1), (Play the transparent harp:1.2), (Fragmented light dots:1.0), (Dawn:1.0), (fluorescence:0.8), (Crystal and transparent:1.0), (Flowing light and overflowing colors:1.0), (sit:1.0), (Close range:1.0), , (by wlop:0.6), (impasto, oil painting:1.2), detailed features, pseudo-impasto, best quality, (detailed features, flat color:1.2), (lineart, flat-pasto:0.3)"
}]```

Negative prompts: (worst quality:1.3), low quality, lowres, messy, abstract, ugly, disfigured, bad anatomy, draft, deformed hands, fused fingers, signature, text, multi views

Sampler: Eular a normal as default, 28+ steps recommended.

One additional merit of Neta Art XL is that it supports a very wide range of CFGs (5 - 20 compared to 7 - 9 of previous models). While we empirically found higher CFG leads to more details and higher contrast, generally CFG 9 - 14 (important!) can be used for best results.

# II. Highlight: Style Versatility

We carefully selected 13 style keys with good orthogonality and are commonly used in many scenarios, justified by usage data from Nieta AI (30M+ generations).
Having orthogonal styles means each style is effectively different from the others, allowing you to easily combine and create new styles without interference.
impasto

Please refer to https://civitai.com/models/124189/anime-illust-diffusion-xl for a complete list of supported artists.

# III. Expression, Posing, and Camera Angles

# IV. Multi-Character Scenes

# V. Text & Typography

# VI.  Training
- Data annotation combining multiple sources (Original prompt, CogVLM captions, WaifuTagger tags)
- Post-processing techniques like semantic deduplication and hierarchical tag organization
  1. Semantic Deduplication: This removed redundant tags by intelligently detecting when a higher-level tag (e.g. very long hair) semantically covered a lower-level one (e.g. long hair).
  2. Tag Layering Algorithm: Tags were organized into hierarchical layers based on their priorities and related semantics (eg. by wlop influence the whole picture styling, while frills influence a small fraction).  More dominant tags were placed in higher layers to prioritize their influence during training.

- Using high-quality regularization data from AIDXL: High-quality regular datasets with "best" and "amazing" quality ratings from AIDXL. These datasets are manually selected and come with detailed annotations and natural language descriptions. 
- Finetuning on more knowledgeable base models like AAM, blending with AnimagineXL 3.1 Character Knowledge.

### Challenges Faced:
- Imbalance in learning different styles
- Poor generalization for some styles to diverse scenes
- Lack of details/texture in generations
- Trigger word overlap with base model knowledge

### Solutions Explored:
- Data reweighting to balance style learning, and supplement diverse data per style.
- Tuning sampling hyperparameters like minimum gamma and rectified flow. Rectified Flow is a training parameter that increases the sampling frequency in the middle time steps but weakens the weight of the model's learning ability for small noises in the low time steps. This technique helps to improve the model's ability to restore styles but requires the use of a knowledge-rich base model.
- Randomizing / drop off trigger words during training.

  
# VII. Evaluation
Nine models are evaluated using 16 styles (some unique to Neta Art, thus biased :P) and 80 prompts. Each prompt generates 3 samples with different aspect ratios, resulting in an XYZ plot (Generated from https://github.com/talesofai/comfyui-browser). 

XYZPlot of 3,840 samples, just for one-time evaluation. Here's an online Example.

Evaluation is done on a 10-point scale across four axes under objective counting rules:
1. Prompt Following: Points deducted for every 5 images unrelated to the prompt.
2. Stability: Points deducted for distortions or breakages in heads, hands, feet, or composition.
3. Diversity: Points deducted for poor style restoration within each style column.
4. Generalizability: Points deducted for poor semantic generalization within each style column.

Additionally, 10 top model trainers provide subjective scores for Aesthetics, with an average calculated for each model.


# VIII. License

- Developed with ❤️ by: Neta.art Lab - https://civitai.com/user/nieta_art
- In collaboration with: 
  - Euge: https://civitai.com/user/Euge_
  - 汤人烂: https://space.bilibili.com/8594480
  - Chenkin: https://civitai.com/user/Chenkin
  - Bo Dai: https://daibo.info/
- Thanks to:
  - https://blog.novelai.net/introducing-novelai-diffusion-anime-v3-6d00d1c118c3
  - https://cagliostrolab.net/posts/animagine-xl-v3-release
  - https://civitai.com/models/269232/aam-xl-anime-mix
  - https://civitai.com/models/124189/anime-illust-diffusion-xl
  - https://github.com/deepghs/waifuc
  - 
- Model type: Diffusion-based text-to-image generative model
- License: We merged 0.05 CLIP and 0.15 UNet input layers from Animagine 3.1, thus Fair AI Public License 1.0-SD

# IX. Conclusion and Future Work
Shortcomings:
1. Some characters are underfitted.
2. Styles are not activated well with long prompts.
3. Certain styles appear grayish at low CFG and short prompts. Partly explained in https://civitai.com/articles/4969.
Future Work:
- Prepare larger training sets and more knowledge-based data to improve character, style, and detail handling.
- Welcome others to join discussions, provide suggestions, and contribute to model advancement.

_Neta Art XL 2.0 is on the way._

Stay tuned with us, and test our product for FREE: http://neta.art/

Discord: https://discord.gg/AtRtbe9W8w

Twitter: https://twitter.com/netaart_ai

Civitai：https://civitai.com/user/neta_art