metadata

license: apache-2.0
language:
  - en
base_model:
  - stabilityai/stable-diffusion-xl-base-1.0
pipeline_tag: text-to-image
tags:
  - art

SDXL-ProteusSigma Training with ZTSNR and NovelAI V3 Improvements

10k dataset proof of concept (completed)link
200k+ dataset finetune (in testing/training)
12M million dataset finetune (planned)

<!-- Main text with effects -->
<g transform="translate(400,100)" text-anchor="middle">
    <!-- Shadow layer -->
    <text class="logo-text-main logo-text-shadow" 
          x="-100" y="0" font-size="80px">Proteus</text>
    
    <!-- Outline layer -->
    <text class="logo-text-main logo-text-outline" 
          x="-100" y="0" font-size="80px">Proteus</text>
    
    <!-- Gradient fill layer -->
    <text class="logo-text-main logo-text-fill" 
          x="-100" y="0" font-size="80px">Proteus</text>
          
    <!-- Sigma symbol -->
    <text x="120" y="0" 
          font-size="80px" 
          fill="#00ffff" 
          filter="url(#chrome)">Σ</text>
    
    <!-- Subtitle -->
    <text class="subtitle" y="40">STABLE DIFFUSION XL</text>
</g>

<!-- Grid effect -->
<path d="M0 180 L800 180" stroke="#ff00ff" stroke-width="1" opacity="0.5"/>
<path d="M0 185 L800 185" stroke="#00ffff" stroke-width="1" opacity="0.3"/>
<path d="M0 190 L800 190" stroke="#ff00ff" stroke-width="1" opacity="0.2"/>

Example Outputs

A digital illustration of a lich with long grey hair and beard, as a university professor wearing a formal suit and standing in front of a class, writing on a whiteboard. He holds a marker, writing complex equations or magical symbols on the whiteboard.

A Candid Photo of a real short grey alien peering around a corner while trying to hide from the viewer in a living room, real photography, fujifilm superia, full HD, taken on a Canon EOS R5 F1.2 ISO100 35MM

Combined Proteus and Mobius datasets.

Recommended Inference Parameters

ComfyUI workflow

"sampler": "euler_ancestral", # Best results with Euler Ancestral

"scheduler": "normal", # Normal noise schedule

"steps": 28, # Optimal step count

"cfg": 7.5 # Classifier-free guidance scale

Model Details

Model Type: SDXL Fine-tuned with ZTSNR and NovelAI V3 Improvements
Base Model: stabilityai/stable-diffusion-xl-base-1.0
Training Dataset: 10,000 high-quality images
License: Apache 2.0

Key Features

Zero Terminal SNR (ZTSNR) implementation
Increased σ_max ≈ 20000.0 (NovelAI research)
High-resolution coherence enhancements
Tag-based CLIP weighting
VAE improvements

Technical Specifications

Noise Schedule: σ_max ≈ 20000.0 to σ_min ≈ 0.0292
Progressive Steps: [20000, 17.8, 12.4, 9.2, 7.2, 5.4, 3.9, 2.1, 0.9, 0.0292]
Resolution Scaling: √(H×W)/1024

Training Details

Training Configuration

Learning Rate: 1e-6
Batch Size: 1
Gradient Accumulation Steps: 1
Optimizer: AdamW
Precision: bfloat16
VAE Finetuning: Enabled
VAE Learning Rate: 1e-6

CLIP Weight Configuration

Character Weight: 1.5
Style Weight: 1.2
Quality Weight: 0.8
Setting Weight: 1.0
Action Weight: 1.1
Object Weight: 0.9

Performance Improvements

47% fewer artifacts at σ < 5.0
Stable composition at σ > 12.4
31% better detail consistency
Improved color accuracy
Enhanced dark tone reproduction

Repository and Resources

GitHub Repository: SDXL-Training-Improvements
Training Code: Available in the repository
Documentation: Implementation Details
Issues and Support: GitHub Issues

Citation

@article{ossa2024improvements,
  title={Improvements to SDXL in NovelAI Diffusion V3},
  author={Ossa, Juan and Doğan, Eren and Birch, Alex and Johnson, F.},
  journal={arXiv preprint arXiv:2409.15997v2},
  year={2024}
}