Aatricks commited on
Commit
b701455
·
0 Parent(s):

Deploy ZeroGPU Gradio Space snapshot

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
.dockerignore ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python cache files
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ develop-eggs/
8
+ dist/
9
+ downloads/
10
+ eggs/
11
+ .eggs/
12
+ lib/
13
+ lib64/
14
+ parts/
15
+ sdist/
16
+ var/
17
+ wheels/
18
+ share/python-wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+ MANIFEST
23
+
24
+ # Exception: Keep SageAttention and SpargeAttn build directories for Docker
25
+ !SageAttention/
26
+ !SpargeAttn/
27
+ !docker/
28
+
29
+ # But exclude their build artifacts
30
+ SageAttention/build/
31
+ SageAttention/*.egg-info/
32
+ SageAttention/**/__pycache__/
33
+ SpargeAttn/build/
34
+ SpargeAttn/*.egg-info/
35
+ SpargeAttn/**/__pycache__/
36
+
37
+ # Virtual environments
38
+ .env
39
+ .venv
40
+ env/
41
+ venv/
42
+ ENV/
43
+ env.bak/
44
+ venv.bak/
45
+
46
+ # IDE files
47
+ .vscode/
48
+ .idea/
49
+ *.swp
50
+ *.swo
51
+ *~
52
+
53
+ # OS files
54
+ .DS_Store
55
+ .DS_Store?
56
+ ._*
57
+ .Spotlight-V100
58
+ .Trashes
59
+ ehthumbs.db
60
+ Thumbs.db
61
+
62
+ # Git
63
+ .git/
64
+ .gitignore
65
+
66
+ # Docker files (not needed in the image, but Dockerfile itself is needed for the build context)
67
+ .dockerignore
68
+
69
+ # Documentation (not needed in runtime, but docker/ scripts are needed for build)
70
+ *.md
71
+ !docker/
72
+ docs/
73
+ !frontend/dist/
74
+ !frontend/dist/**
75
+
76
+ # Large model files (these should be downloaded at runtime)
77
+ *.safetensors
78
+ *.ckpt
79
+ *.pt
80
+ *.pth
81
+ *.bin
82
+ *.gguf
83
+
84
+ # Logs
85
+ *.log
86
+ logs/
87
+
88
+ # Temporary files
89
+ tmp/
90
+ temp/
91
+ *.tmp
92
+
93
+ # Generated images (these will be created at runtime)
94
+ output/
95
+
96
+ # Large dependencies that will be installed via pip
97
+ stable_fast-*.whl
.gitattributes ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ # Auto detect text files and perform LF normalization
2
+ * text=auto
.github/instructions/memory.instruction.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ applyTo: '**'
3
+ ---
4
+
5
+ # User Memory
6
+
7
+ ## User Preferences
8
+ - Programming languages:
9
+ - Code style preferences:
10
+ - Development environment:
11
+ - Communication style:
12
+
13
+ ## Project Context
14
+ - Current project type:
15
+ - Tech stack:
16
+ - Architecture patterns:
17
+ - Key requirements:
18
+
19
+ ## Coding Patterns
20
+ - Preferred patterns and practices
21
+ - Code organization preferences
22
+ - Testing approaches
23
+ - Documentation style
24
+
25
+ ## Context7 Research History
26
+ - Libraries researched on Context7
27
+ - Best practices discovered
28
+ - Implementation patterns used
29
+ - Version-specific findings
30
+
31
+ - 2026-02-11: Searched Context7 for pytest; no libraries found. Reviewed Context7 MCP docs (all-clients, adding-libraries, troubleshooting, api-guide, developer guide) to satisfy research requirements for this task.
32
+
33
+ ## Conversation History
34
+ - 2026-02-11: Requested DifferentialDiffusion class excerpt with line numbers from src/AutoDetailer/ADetailer.py.
35
+ - 2026-02-11: Fixing ADetailer SDXL mask behavior by applying denoise_mask blending in KSamplerX0Inpaint; will add tests and validate with manual image generation.
36
+ - 2026-02-11: Added denoise_mask resizing to latent resolution to avoid shape mismatch; generated SDXL baseline and ADetailer outputs for manual verification.
37
+ - 2026-02-11: Normalized ADetailer noise masks to [0,1], aligned SDXL crop conditioning to crop-local sizes, and added unit tests plus manual SDXL ADetailer generation and image stats verification.
38
+ - 2026-02-11: Began implementation of mask-aware regression test for ADetailer SDXL noise masking.
39
+ - 2026-02-11: Added deterministic unit test that stubs sampling and verifies noise is localized to resized mask region in enhance_detail.
40
+ - Important decisions made
41
+ - Recurring questions or topics
42
+ - Solutions that worked well
43
+ - Things to avoid or that didn't work
44
+
45
+ ## Notes
46
+ - 2026-02-11: pytest -q tests/unit/test_adetailer_noise_mask.py passed (4 tests).
.github/workflows/ci.yml ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ pull_request:
7
+ branches: [main]
8
+
9
+ jobs:
10
+ test:
11
+ runs-on: ubuntu-latest
12
+ strategy:
13
+ matrix:
14
+ python-version: ['3.10', '3.14']
15
+ fail-fast: false
16
+
17
+ steps:
18
+ - uses: actions/checkout@v4
19
+
20
+ - name: Set up Python ${{ matrix.python-version }}
21
+ uses: actions/setup-python@v5
22
+ with:
23
+ python-version: ${{ matrix.python-version }}
24
+
25
+ - name: Cache pip dependencies
26
+ uses: actions/cache@v4
27
+ with:
28
+ path: ~/.cache/pip
29
+ key: ${{ runner.os }}-pip-${{ matrix.python-version }}-${{ hashFiles('requirements.txt') }}
30
+ restore-keys: |
31
+ ${{ runner.os }}-pip-${{ matrix.python-version }}-
32
+ ${{ runner.os }}-pip-
33
+
34
+ - name: Install dependencies
35
+ run: |
36
+ python -m pip install --upgrade pip
37
+ pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
38
+ pip install "numpy<2.0.0"
39
+ pip install pytest pytest-cov
40
+ pip install -r requirements.txt
41
+
42
+ - name: Run tests
43
+ run: |
44
+ # Run tests file-by-file for isolation to prevent mock leaks between suites.
45
+ # We handle exit code 5 (no tests collected) which happens when a file
46
+ # only contains 'slow' or 'gpu' tests that are filtered out.
47
+ failed=0
48
+ for f in $(find tests -type f -name "test_*.py"); do
49
+ echo "Running tests in $f..."
50
+ if pytest -v -m "not gpu and not slow" --tb=short "$f"; then
51
+ echo "Successfully ran tests in $f"
52
+ else
53
+ status=$?
54
+ if [ $status -eq 5 ]; then
55
+ echo "No tests matching filter in $f (exit code 5), continuing..."
56
+ else
57
+ echo "Error: tests in $f failed with exit code $status"
58
+ failed=1
59
+ fi
60
+ fi
61
+ done
62
+ if [ $failed -ne 0 ]; then
63
+ echo "One or more test suites failed."
64
+ exit 1
65
+ fi
66
+
67
+ - name: Upload coverage report
68
+ if: matrix.python-version == '3.10'
69
+ uses: actions/upload-artifact@v4
70
+ with:
71
+ name: coverage-report
72
+ path: htmlcov/
73
+ if-no-files-found: ignore
.gitignore ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.pyc
2
+ *.pth
3
+ *.pt
4
+ *.safetensors
5
+ *.png
6
+ stable_fast-*.whl
7
+ .venv
8
+ node_modules/
9
+ frontend/node_modules/
10
+ *.log
11
+ .history_backups
12
+ include/last_seed.txt
13
+ include/settings_store.json
14
+ docs/ai/
.python-version ADDED
@@ -0,0 +1 @@
 
 
1
+ 3.12
Dockerfile ADDED
@@ -0,0 +1,218 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM node:22-bookworm-slim AS frontend-builder
2
+
3
+ WORKDIR /frontend
4
+
5
+ COPY frontend/package.json frontend/package-lock.json ./
6
+ RUN npm ci
7
+
8
+ COPY frontend/ ./
9
+ RUN npm run build
10
+
11
+
12
+ FROM nvidia/cuda:12.8.0-devel-ubuntu22.04
13
+
14
+ ENV DEBIAN_FRONTEND=noninteractive
15
+ ENV PYTHONUNBUFFERED=1
16
+ ENV PYTHONDONTWRITEBYTECODE=1
17
+ ENV CUDA_HOME=/usr/local/cuda
18
+ ENV PATH=${CUDA_HOME}/bin:${PATH}
19
+ ENV LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}
20
+ ENV TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0;12.0"
21
+
22
+ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
23
+ --mount=type=cache,target=/var/lib/apt,sharing=locked \
24
+ apt-get update && apt-get install -y \
25
+ python3.10 \
26
+ python3.10-dev \
27
+ python3.10-venv \
28
+ python3-pip \
29
+ python3-tk \
30
+ git \
31
+ wget \
32
+ curl \
33
+ build-essential \
34
+ libgl1-mesa-glx \
35
+ libglib2.0-0 \
36
+ libsm6 \
37
+ libxext6 \
38
+ libxrender-dev \
39
+ libgomp1 \
40
+ software-properties-common \
41
+ ninja-build \
42
+ && rm -rf /var/lib/apt/lists/*
43
+
44
+ RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1
45
+
46
+ WORKDIR /app
47
+
48
+ COPY requirements.txt ./
49
+
50
+ RUN --mount=type=cache,target=/root/.cache/pip python3 -m pip install --upgrade pip
51
+ RUN --mount=type=cache,target=/root/.cache/pip python3 -m pip install uv
52
+
53
+ RUN --mount=type=cache,target=/root/.cache/uv /bin/sh -c 'set -e; \
54
+ python3 -m uv pip install --system --index-url https://download.pytorch.org/whl/cu128 \
55
+ torch torchvision "triton>=2.1.0"; \
56
+ if echo "${TORCH_CUDA_ARCH_LIST}" | grep -q "12\.0"; then \
57
+ echo "Detected compute capability 12.0 (RTX 50 series). Skipping xformers install."; \
58
+ else \
59
+ python3 -m uv pip install --system xformers; \
60
+ fi'
61
+
62
+ RUN --mount=type=cache,target=/root/.cache/uv python3 -m uv pip install --system "numpy<2.0.0"
63
+ RUN --mount=type=cache,target=/root/.cache/uv python3 -m uv pip install --system -r requirements.txt
64
+
65
+ ARG TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0;12.0"
66
+ ENV TORCH_CUDA_ARCH_LIST=${TORCH_CUDA_ARCH_LIST}
67
+
68
+ ARG INSTALL_STABLE_FAST=0
69
+ ENV INSTALL_STABLE_FAST=${INSTALL_STABLE_FAST}
70
+
71
+ ARG INSTALL_OLLAMA=0
72
+ ENV INSTALL_OLLAMA=${INSTALL_OLLAMA}
73
+
74
+ ARG INSTALL_SAGEATTENTION=0
75
+ ENV INSTALL_SAGEATTENTION=${INSTALL_SAGEATTENTION}
76
+
77
+ ARG INSTALL_SPARGEATTN=0
78
+ ENV INSTALL_SPARGEATTN=${INSTALL_SPARGEATTN}
79
+
80
+ RUN --mount=type=cache,target=/root/.cache/pip \
81
+ --mount=type=cache,target=/build-cache/stablefast,sharing=locked /bin/sh -c ' \
82
+ if [ "${INSTALL_STABLE_FAST}" = "1" ]; then \
83
+ echo "Installing stable-fast for CUDA architectures: ${TORCH_CUDA_ARCH_LIST}"; \
84
+ export TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST}"; \
85
+ export FORCE_CUDA=1; \
86
+ mkdir -p /build-cache/stablefast; \
87
+ python3 -m pip wheel --no-build-isolation --wheel-dir /build-cache/stablefast \
88
+ git+https://github.com/chengzeyi/stable-fast.git@main#egg=stable-fast; \
89
+ python3 -m pip install --no-build-isolation --no-index --find-links /build-cache/stablefast stable-fast; \
90
+ else \
91
+ echo "Skipping stable-fast installation (INSTALL_STABLE_FAST=${INSTALL_STABLE_FAST})"; \
92
+ fi'
93
+
94
+ RUN --mount=type=cache,target=/build-cache/ollama,sharing=locked /bin/sh -c ' \
95
+ if [ "${INSTALL_OLLAMA}" = "1" ]; then \
96
+ echo "Installing Ollama and pulling qwen3:0.6b"; \
97
+ mkdir -p /build-cache/ollama; \
98
+ curl -fsSL https://ollama.com/install.sh -o /build-cache/ollama/install.sh; \
99
+ sh /build-cache/ollama/install.sh; \
100
+ export OLLAMA_HOME=/build-cache/ollama; \
101
+ ollama serve >/tmp/ollama.log 2>&1 & \
102
+ OLLAMA_PID=$!; \
103
+ attempts=0; \
104
+ until curl -fsS http://127.0.0.1:11434/api/version >/dev/null 2>&1; do \
105
+ attempts=$((attempts + 1)); \
106
+ if [ ${attempts} -gt 20 ]; then \
107
+ echo "Ollama failed to start"; \
108
+ kill ${OLLAMA_PID} >/dev/null 2>&1 || true; \
109
+ exit 1; \
110
+ fi; \
111
+ sleep 1; \
112
+ done; \
113
+ ollama pull qwen3:0.6b; \
114
+ kill ${OLLAMA_PID} >/dev/null 2>&1 || true; \
115
+ wait ${OLLAMA_PID} 2>/dev/null || true; \
116
+ else \
117
+ echo "Skipping Ollama installation (INSTALL_OLLAMA=${INSTALL_OLLAMA})"; \
118
+ fi'
119
+
120
+ COPY . .
121
+ COPY --from=frontend-builder /frontend/dist ./frontend/dist
122
+
123
+ RUN --mount=type=cache,target=/root/.cache/torch_extensions,sharing=locked \
124
+ --mount=type=cache,target=/build-cache/sageattention,sharing=locked /bin/sh -c ' \
125
+ if [ "${INSTALL_SAGEATTENTION}" = "1" ]; then \
126
+ if [ -d "SageAttention" ]; then \
127
+ echo "Found SageAttention - applying patch"; \
128
+ cd SageAttention; \
129
+ python3 ../docker/patch_sageattention.py; \
130
+ python3 -m pip wheel --no-build-isolation --wheel-dir /build-cache/sageattention .; \
131
+ python3 -m pip install --no-index /build-cache/sageattention/*.whl; \
132
+ cd ..; \
133
+ rm -rf SageAttention/build SageAttention/*.egg-info; \
134
+ else \
135
+ echo "SageAttention directory not found - cloning and applying patch"; \
136
+ git clone --depth 1 https://github.com/thu-ml/SageAttention /tmp/SageAttention; \
137
+ cd /tmp/SageAttention; \
138
+ python3 /app/docker/patch_sageattention.py; \
139
+ python3 -m pip wheel --no-build-isolation --wheel-dir /build-cache/sageattention .; \
140
+ python3 -m pip install --no-index /build-cache/sageattention/*.whl; \
141
+ rm -rf /tmp/SageAttention/build /tmp/SageAttention/*.egg-info; \
142
+ rm -rf /tmp/SageAttention; \
143
+ fi; \
144
+ else \
145
+ echo "Skipping SageAttention installation (INSTALL_SAGEATTENTION=${INSTALL_SAGEATTENTION})"; \
146
+ fi'
147
+
148
+ RUN --mount=type=cache,target=/root/.cache/torch_extensions,sharing=locked \
149
+ --mount=type=cache,target=/build-cache/spargeattn,sharing=locked /bin/sh -c ' \
150
+ if [ "${INSTALL_SPARGEATTN}" = "1" ]; then \
151
+ if [ -d "SpargeAttn" ]; then \
152
+ cd SpargeAttn; \
153
+ if echo "${TORCH_CUDA_ARCH_LIST}" | grep -qE "(8\.0|8\.6|8\.7|8\.9|9\.0)"; then \
154
+ echo "Building SpargeAttn for supported architectures: ${TORCH_CUDA_ARCH_LIST}"; \
155
+ python3 -m pip wheel --no-build-isolation --wheel-dir /build-cache/spargeattn .; \
156
+ python3 -m pip install --no-index /build-cache/spargeattn/*.whl; \
157
+ rm -rf build *.egg-info; \
158
+ else \
159
+ echo "Skipping SpargeAttn - architecture ${TORCH_CUDA_ARCH_LIST} not supported (requires 8.0-9.0)"; \
160
+ fi; \
161
+ cd ..; \
162
+ else \
163
+ echo "SpargeAttn directory not found - cloning and attempting build if supported"; \
164
+ git clone --depth 1 https://github.com/thu-ml/SpargeAttn /tmp/SpargeAttn; \
165
+ cd /tmp/SpargeAttn; \
166
+ if echo "${TORCH_CUDA_ARCH_LIST}" | grep -qE "(8\.0|8\.6|8\.7|8\.9|9\.0)"; then \
167
+ echo "Building cloned SpargeAttn for supported architectures: ${TORCH_CUDA_ARCH_LIST}"; \
168
+ python3 -m pip wheel --no-build-isolation --wheel-dir /build-cache/spargeattn .; \
169
+ python3 -m pip install --no-index /build-cache/spargeattn/*.whl; \
170
+ rm -rf build *.egg-info; \
171
+ else \
172
+ echo "Skipping cloned SpargeAttn - architecture ${TORCH_CUDA_ARCH_LIST} not supported (requires 8.0-9.0)"; \
173
+ fi; \
174
+ cd /app; \
175
+ rm -rf /tmp/SpargeAttn; \
176
+ fi; \
177
+ else \
178
+ echo "Skipping SpargeAttn installation (INSTALL_SPARGEATTN=${INSTALL_SPARGEATTN})"; \
179
+ fi'
180
+
181
+ RUN mkdir -p ./output/classic \
182
+ ./output/Flux \
183
+ ./output/HiresFix \
184
+ ./output/Img2Img \
185
+ ./output/Adetailer \
186
+ ./include/checkpoints \
187
+ ./include/clip \
188
+ ./include/embeddings \
189
+ ./include/ESRGAN \
190
+ ./include/loras \
191
+ ./include/sd1_tokenizer \
192
+ ./include/text_encoder \
193
+ ./include/unet \
194
+ ./include/vae \
195
+ ./include/vae_approx \
196
+ ./include/yolos
197
+
198
+ RUN echo "42" > ./include/last_seed.txt
199
+ RUN echo "A beautiful landscape" > ./include/prompt.txt
200
+
201
+ EXPOSE 7860
202
+
203
+ ENV PORT=7860
204
+
205
+ HEALTHCHECK --interval=30s --timeout=30s --start-period=60s --retries=3 \
206
+ CMD curl -f http://localhost:${PORT}/health || exit 1
207
+
208
+ CMD if [ "${INSTALL_OLLAMA}" = "1" ]; then \
209
+ echo "Starting Ollama server"; \
210
+ ollama serve >/tmp/ollama_runtime.log 2>&1 & \
211
+ for attempt in $(seq 1 20); do \
212
+ if curl -fsS http://127.0.0.1:11434/api/version >/dev/null 2>&1; then \
213
+ break; \
214
+ fi; \
215
+ sleep 1; \
216
+ done; \
217
+ fi; \
218
+ exec python3 server.py --host 0.0.0.0 --port "${PORT}"
LICENSE ADDED
@@ -0,0 +1,674 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ GNU GENERAL PUBLIC LICENSE
2
+ Version 3, 29 June 2007
3
+
4
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
5
+ Everyone is permitted to copy and distribute verbatim copies
6
+ of this license document, but changing it is not allowed.
7
+
8
+ Preamble
9
+
10
+ The GNU General Public License is a free, copyleft license for
11
+ software and other kinds of works.
12
+
13
+ The licenses for most software and other practical works are designed
14
+ to take away your freedom to share and change the works. By contrast,
15
+ the GNU General Public License is intended to guarantee your freedom to
16
+ share and change all versions of a program--to make sure it remains free
17
+ software for all its users. We, the Free Software Foundation, use the
18
+ GNU General Public License for most of our software; it applies also to
19
+ any other work released this way by its authors. You can apply it to
20
+ your programs, too.
21
+
22
+ When we speak of free software, we are referring to freedom, not
23
+ price. Our General Public Licenses are designed to make sure that you
24
+ have the freedom to distribute copies of free software (and charge for
25
+ them if you wish), that you receive source code or can get it if you
26
+ want it, that you can change the software or use pieces of it in new
27
+ free programs, and that you know you can do these things.
28
+
29
+ To protect your rights, we need to prevent others from denying you
30
+ these rights or asking you to surrender the rights. Therefore, you have
31
+ certain responsibilities if you distribute copies of the software, or if
32
+ you modify it: responsibilities to respect the freedom of others.
33
+
34
+ For example, if you distribute copies of such a program, whether
35
+ gratis or for a fee, you must pass on to the recipients the same
36
+ freedoms that you received. You must make sure that they, too, receive
37
+ or can get the source code. And you must show them these terms so they
38
+ know their rights.
39
+
40
+ Developers that use the GNU GPL protect your rights with two steps:
41
+ (1) assert copyright on the software, and (2) offer you this License
42
+ giving you legal permission to copy, distribute and/or modify it.
43
+
44
+ For the developers' and authors' protection, the GPL clearly explains
45
+ that there is no warranty for this free software. For both users' and
46
+ authors' sake, the GPL requires that modified versions be marked as
47
+ changed, so that their problems will not be attributed erroneously to
48
+ authors of previous versions.
49
+
50
+ Some devices are designed to deny users access to install or run
51
+ modified versions of the software inside them, although the manufacturer
52
+ can do so. This is fundamentally incompatible with the aim of
53
+ protecting users' freedom to change the software. The systematic
54
+ pattern of such abuse occurs in the area of products for individuals to
55
+ use, which is precisely where it is most unacceptable. Therefore, we
56
+ have designed this version of the GPL to prohibit the practice for those
57
+ products. If such problems arise substantially in other domains, we
58
+ stand ready to extend this provision to those domains in future versions
59
+ of the GPL, as needed to protect the freedom of users.
60
+
61
+ Finally, every program is threatened constantly by software patents.
62
+ States should not allow patents to restrict development and use of
63
+ software on general-purpose computers, but in those that do, we wish to
64
+ avoid the special danger that patents applied to a free program could
65
+ make it effectively proprietary. To prevent this, the GPL assures that
66
+ patents cannot be used to render the program non-free.
67
+
68
+ The precise terms and conditions for copying, distribution and
69
+ modification follow.
70
+
71
+ TERMS AND CONDITIONS
72
+
73
+ 0. Definitions.
74
+
75
+ "This License" refers to version 3 of the GNU General Public License.
76
+
77
+ "Copyright" also means copyright-like laws that apply to other kinds of
78
+ works, such as semiconductor masks.
79
+
80
+ "The Program" refers to any copyrightable work licensed under this
81
+ License. Each licensee is addressed as "you". "Licensees" and
82
+ "recipients" may be individuals or organizations.
83
+
84
+ To "modify" a work means to copy from or adapt all or part of the work
85
+ in a fashion requiring copyright permission, other than the making of an
86
+ exact copy. The resulting work is called a "modified version" of the
87
+ earlier work or a work "based on" the earlier work.
88
+
89
+ A "covered work" means either the unmodified Program or a work based
90
+ on the Program.
91
+
92
+ To "propagate" a work means to do anything with it that, without
93
+ permission, would make you directly or secondarily liable for
94
+ infringement under applicable copyright law, except executing it on a
95
+ computer or modifying a private copy. Propagation includes copying,
96
+ distribution (with or without modification), making available to the
97
+ public, and in some countries other activities as well.
98
+
99
+ To "convey" a work means any kind of propagation that enables other
100
+ parties to make or receive copies. Mere interaction with a user through
101
+ a computer network, with no transfer of a copy, is not conveying.
102
+
103
+ An interactive user interface displays "Appropriate Legal Notices"
104
+ to the extent that it includes a convenient and prominently visible
105
+ feature that (1) displays an appropriate copyright notice, and (2)
106
+ tells the user that there is no warranty for the work (except to the
107
+ extent that warranties are provided), that licensees may convey the
108
+ work under this License, and how to view a copy of this License. If
109
+ the interface presents a list of user commands or options, such as a
110
+ menu, a prominent item in the list meets this criterion.
111
+
112
+ 1. Source Code.
113
+
114
+ The "source code" for a work means the preferred form of the work
115
+ for making modifications to it. "Object code" means any non-source
116
+ form of a work.
117
+
118
+ A "Standard Interface" means an interface that either is an official
119
+ standard defined by a recognized standards body, or, in the case of
120
+ interfaces specified for a particular programming language, one that
121
+ is widely used among developers working in that language.
122
+
123
+ The "System Libraries" of an executable work include anything, other
124
+ than the work as a whole, that (a) is included in the normal form of
125
+ packaging a Major Component, but which is not part of that Major
126
+ Component, and (b) serves only to enable use of the work with that
127
+ Major Component, or to implement a Standard Interface for which an
128
+ implementation is available to the public in source code form. A
129
+ "Major Component", in this context, means a major essential component
130
+ (kernel, window system, and so on) of the specific operating system
131
+ (if any) on which the executable work runs, or a compiler used to
132
+ produce the work, or an object code interpreter used to run it.
133
+
134
+ The "Corresponding Source" for a work in object code form means all
135
+ the source code needed to generate, install, and (for an executable
136
+ work) run the object code and to modify the work, including scripts to
137
+ control those activities. However, it does not include the work's
138
+ System Libraries, or general-purpose tools or generally available free
139
+ programs which are used unmodified in performing those activities but
140
+ which are not part of the work. For example, Corresponding Source
141
+ includes interface definition files associated with source files for
142
+ the work, and the source code for shared libraries and dynamically
143
+ linked subprograms that the work is specifically designed to require,
144
+ such as by intimate data communication or control flow between those
145
+ subprograms and other parts of the work.
146
+
147
+ The Corresponding Source need not include anything that users
148
+ can regenerate automatically from other parts of the Corresponding
149
+ Source.
150
+
151
+ The Corresponding Source for a work in source code form is that
152
+ same work.
153
+
154
+ 2. Basic Permissions.
155
+
156
+ All rights granted under this License are granted for the term of
157
+ copyright on the Program, and are irrevocable provided the stated
158
+ conditions are met. This License explicitly affirms your unlimited
159
+ permission to run the unmodified Program. The output from running a
160
+ covered work is covered by this License only if the output, given its
161
+ content, constitutes a covered work. This License acknowledges your
162
+ rights of fair use or other equivalent, as provided by copyright law.
163
+
164
+ You may make, run and propagate covered works that you do not
165
+ convey, without conditions so long as your license otherwise remains
166
+ in force. You may convey covered works to others for the sole purpose
167
+ of having them make modifications exclusively for you, or provide you
168
+ with facilities for running those works, provided that you comply with
169
+ the terms of this License in conveying all material for which you do
170
+ not control copyright. Those thus making or running the covered works
171
+ for you must do so exclusively on your behalf, under your direction
172
+ and control, on terms that prohibit them from making any copies of
173
+ your copyrighted material outside their relationship with you.
174
+
175
+ Conveying under any other circumstances is permitted solely under
176
+ the conditions stated below. Sublicensing is not allowed; section 10
177
+ makes it unnecessary.
178
+
179
+ 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180
+
181
+ No covered work shall be deemed part of an effective technological
182
+ measure under any applicable law fulfilling obligations under article
183
+ 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184
+ similar laws prohibiting or restricting circumvention of such
185
+ measures.
186
+
187
+ When you convey a covered work, you waive any legal power to forbid
188
+ circumvention of technological measures to the extent such circumvention
189
+ is effected by exercising rights under this License with respect to
190
+ the covered work, and you disclaim any intention to limit operation or
191
+ modification of the work as a means of enforcing, against the work's
192
+ users, your or third parties' legal rights to forbid circumvention of
193
+ technological measures.
194
+
195
+ 4. Conveying Verbatim Copies.
196
+
197
+ You may convey verbatim copies of the Program's source code as you
198
+ receive it, in any medium, provided that you conspicuously and
199
+ appropriately publish on each copy an appropriate copyright notice;
200
+ keep intact all notices stating that this License and any
201
+ non-permissive terms added in accord with section 7 apply to the code;
202
+ keep intact all notices of the absence of any warranty; and give all
203
+ recipients a copy of this License along with the Program.
204
+
205
+ You may charge any price or no price for each copy that you convey,
206
+ and you may offer support or warranty protection for a fee.
207
+
208
+ 5. Conveying Modified Source Versions.
209
+
210
+ You may convey a work based on the Program, or the modifications to
211
+ produce it from the Program, in the form of source code under the
212
+ terms of section 4, provided that you also meet all of these conditions:
213
+
214
+ a) The work must carry prominent notices stating that you modified
215
+ it, and giving a relevant date.
216
+
217
+ b) The work must carry prominent notices stating that it is
218
+ released under this License and any conditions added under section
219
+ 7. This requirement modifies the requirement in section 4 to
220
+ "keep intact all notices".
221
+
222
+ c) You must license the entire work, as a whole, under this
223
+ License to anyone who comes into possession of a copy. This
224
+ License will therefore apply, along with any applicable section 7
225
+ additional terms, to the whole of the work, and all its parts,
226
+ regardless of how they are packaged. This License gives no
227
+ permission to license the work in any other way, but it does not
228
+ invalidate such permission if you have separately received it.
229
+
230
+ d) If the work has interactive user interfaces, each must display
231
+ Appropriate Legal Notices; however, if the Program has interactive
232
+ interfaces that do not display Appropriate Legal Notices, your
233
+ work need not make them do so.
234
+
235
+ A compilation of a covered work with other separate and independent
236
+ works, which are not by their nature extensions of the covered work,
237
+ and which are not combined with it such as to form a larger program,
238
+ in or on a volume of a storage or distribution medium, is called an
239
+ "aggregate" if the compilation and its resulting copyright are not
240
+ used to limit the access or legal rights of the compilation's users
241
+ beyond what the individual works permit. Inclusion of a covered work
242
+ in an aggregate does not cause this License to apply to the other
243
+ parts of the aggregate.
244
+
245
+ 6. Conveying Non-Source Forms.
246
+
247
+ You may convey a covered work in object code form under the terms
248
+ of sections 4 and 5, provided that you also convey the
249
+ machine-readable Corresponding Source under the terms of this License,
250
+ in one of these ways:
251
+
252
+ a) Convey the object code in, or embodied in, a physical product
253
+ (including a physical distribution medium), accompanied by the
254
+ Corresponding Source fixed on a durable physical medium
255
+ customarily used for software interchange.
256
+
257
+ b) Convey the object code in, or embodied in, a physical product
258
+ (including a physical distribution medium), accompanied by a
259
+ written offer, valid for at least three years and valid for as
260
+ long as you offer spare parts or customer support for that product
261
+ model, to give anyone who possesses the object code either (1) a
262
+ copy of the Corresponding Source for all the software in the
263
+ product that is covered by this License, on a durable physical
264
+ medium customarily used for software interchange, for a price no
265
+ more than your reasonable cost of physically performing this
266
+ conveying of source, or (2) access to copy the
267
+ Corresponding Source from a network server at no charge.
268
+
269
+ c) Convey individual copies of the object code with a copy of the
270
+ written offer to provide the Corresponding Source. This
271
+ alternative is allowed only occasionally and noncommercially, and
272
+ only if you received the object code with such an offer, in accord
273
+ with subsection 6b.
274
+
275
+ d) Convey the object code by offering access from a designated
276
+ place (gratis or for a charge), and offer equivalent access to the
277
+ Corresponding Source in the same way through the same place at no
278
+ further charge. You need not require recipients to copy the
279
+ Corresponding Source along with the object code. If the place to
280
+ copy the object code is a network server, the Corresponding Source
281
+ may be on a different server (operated by you or a third party)
282
+ that supports equivalent copying facilities, provided you maintain
283
+ clear directions next to the object code saying where to find the
284
+ Corresponding Source. Regardless of what server hosts the
285
+ Corresponding Source, you remain obligated to ensure that it is
286
+ available for as long as needed to satisfy these requirements.
287
+
288
+ e) Convey the object code using peer-to-peer transmission, provided
289
+ you inform other peers where the object code and Corresponding
290
+ Source of the work are being offered to the general public at no
291
+ charge under subsection 6d.
292
+
293
+ A separable portion of the object code, whose source code is excluded
294
+ from the Corresponding Source as a System Library, need not be
295
+ included in conveying the object code work.
296
+
297
+ A "User Product" is either (1) a "consumer product", which means any
298
+ tangible personal property which is normally used for personal, family,
299
+ or household purposes, or (2) anything designed or sold for incorporation
300
+ into a dwelling. In determining whether a product is a consumer product,
301
+ doubtful cases shall be resolved in favor of coverage. For a particular
302
+ product received by a particular user, "normally used" refers to a
303
+ typical or common use of that class of product, regardless of the status
304
+ of the particular user or of the way in which the particular user
305
+ actually uses, or expects or is expected to use, the product. A product
306
+ is a consumer product regardless of whether the product has substantial
307
+ commercial, industrial or non-consumer uses, unless such uses represent
308
+ the only significant mode of use of the product.
309
+
310
+ "Installation Information" for a User Product means any methods,
311
+ procedures, authorization keys, or other information required to install
312
+ and execute modified versions of a covered work in that User Product from
313
+ a modified version of its Corresponding Source. The information must
314
+ suffice to ensure that the continued functioning of the modified object
315
+ code is in no case prevented or interfered with solely because
316
+ modification has been made.
317
+
318
+ If you convey an object code work under this section in, or with, or
319
+ specifically for use in, a User Product, and the conveying occurs as
320
+ part of a transaction in which the right of possession and use of the
321
+ User Product is transferred to the recipient in perpetuity or for a
322
+ fixed term (regardless of how the transaction is characterized), the
323
+ Corresponding Source conveyed under this section must be accompanied
324
+ by the Installation Information. But this requirement does not apply
325
+ if neither you nor any third party retains the ability to install
326
+ modified object code on the User Product (for example, the work has
327
+ been installed in ROM).
328
+
329
+ The requirement to provide Installation Information does not include a
330
+ requirement to continue to provide support service, warranty, or updates
331
+ for a work that has been modified or installed by the recipient, or for
332
+ the User Product in which it has been modified or installed. Access to a
333
+ network may be denied when the modification itself materially and
334
+ adversely affects the operation of the network or violates the rules and
335
+ protocols for communication across the network.
336
+
337
+ Corresponding Source conveyed, and Installation Information provided,
338
+ in accord with this section must be in a format that is publicly
339
+ documented (and with an implementation available to the public in
340
+ source code form), and must require no special password or key for
341
+ unpacking, reading or copying.
342
+
343
+ 7. Additional Terms.
344
+
345
+ "Additional permissions" are terms that supplement the terms of this
346
+ License by making exceptions from one or more of its conditions.
347
+ Additional permissions that are applicable to the entire Program shall
348
+ be treated as though they were included in this License, to the extent
349
+ that they are valid under applicable law. If additional permissions
350
+ apply only to part of the Program, that part may be used separately
351
+ under those permissions, but the entire Program remains governed by
352
+ this License without regard to the additional permissions.
353
+
354
+ When you convey a copy of a covered work, you may at your option
355
+ remove any additional permissions from that copy, or from any part of
356
+ it. (Additional permissions may be written to require their own
357
+ removal in certain cases when you modify the work.) You may place
358
+ additional permissions on material, added by you to a covered work,
359
+ for which you have or can give appropriate copyright permission.
360
+
361
+ Notwithstanding any other provision of this License, for material you
362
+ add to a covered work, you may (if authorized by the copyright holders of
363
+ that material) supplement the terms of this License with terms:
364
+
365
+ a) Disclaiming warranty or limiting liability differently from the
366
+ terms of sections 15 and 16 of this License; or
367
+
368
+ b) Requiring preservation of specified reasonable legal notices or
369
+ author attributions in that material or in the Appropriate Legal
370
+ Notices displayed by works containing it; or
371
+
372
+ c) Prohibiting misrepresentation of the origin of that material, or
373
+ requiring that modified versions of such material be marked in
374
+ reasonable ways as different from the original version; or
375
+
376
+ d) Limiting the use for publicity purposes of names of licensors or
377
+ authors of the material; or
378
+
379
+ e) Declining to grant rights under trademark law for use of some
380
+ trade names, trademarks, or service marks; or
381
+
382
+ f) Requiring indemnification of licensors and authors of that
383
+ material by anyone who conveys the material (or modified versions of
384
+ it) with contractual assumptions of liability to the recipient, for
385
+ any liability that these contractual assumptions directly impose on
386
+ those licensors and authors.
387
+
388
+ All other non-permissive additional terms are considered "further
389
+ restrictions" within the meaning of section 10. If the Program as you
390
+ received it, or any part of it, contains a notice stating that it is
391
+ governed by this License along with a term that is a further
392
+ restriction, you may remove that term. If a license document contains
393
+ a further restriction but permits relicensing or conveying under this
394
+ License, you may add to a covered work material governed by the terms
395
+ of that license document, provided that the further restriction does
396
+ not survive such relicensing or conveying.
397
+
398
+ If you add terms to a covered work in accord with this section, you
399
+ must place, in the relevant source files, a statement of the
400
+ additional terms that apply to those files, or a notice indicating
401
+ where to find the applicable terms.
402
+
403
+ Additional terms, permissive or non-permissive, may be stated in the
404
+ form of a separately written license, or stated as exceptions;
405
+ the above requirements apply either way.
406
+
407
+ 8. Termination.
408
+
409
+ You may not propagate or modify a covered work except as expressly
410
+ provided under this License. Any attempt otherwise to propagate or
411
+ modify it is void, and will automatically terminate your rights under
412
+ this License (including any patent licenses granted under the third
413
+ paragraph of section 11).
414
+
415
+ However, if you cease all violation of this License, then your
416
+ license from a particular copyright holder is reinstated (a)
417
+ provisionally, unless and until the copyright holder explicitly and
418
+ finally terminates your license, and (b) permanently, if the copyright
419
+ holder fails to notify you of the violation by some reasonable means
420
+ prior to 60 days after the cessation.
421
+
422
+ Moreover, your license from a particular copyright holder is
423
+ reinstated permanently if the copyright holder notifies you of the
424
+ violation by some reasonable means, this is the first time you have
425
+ received notice of violation of this License (for any work) from that
426
+ copyright holder, and you cure the violation prior to 30 days after
427
+ your receipt of the notice.
428
+
429
+ Termination of your rights under this section does not terminate the
430
+ licenses of parties who have received copies or rights from you under
431
+ this License. If your rights have been terminated and not permanently
432
+ reinstated, you do not qualify to receive new licenses for the same
433
+ material under section 10.
434
+
435
+ 9. Acceptance Not Required for Having Copies.
436
+
437
+ You are not required to accept this License in order to receive or
438
+ run a copy of the Program. Ancillary propagation of a covered work
439
+ occurring solely as a consequence of using peer-to-peer transmission
440
+ to receive a copy likewise does not require acceptance. However,
441
+ nothing other than this License grants you permission to propagate or
442
+ modify any covered work. These actions infringe copyright if you do
443
+ not accept this License. Therefore, by modifying or propagating a
444
+ covered work, you indicate your acceptance of this License to do so.
445
+
446
+ 10. Automatic Licensing of Downstream Recipients.
447
+
448
+ Each time you convey a covered work, the recipient automatically
449
+ receives a license from the original licensors, to run, modify and
450
+ propagate that work, subject to this License. You are not responsible
451
+ for enforcing compliance by third parties with this License.
452
+
453
+ An "entity transaction" is a transaction transferring control of an
454
+ organization, or substantially all assets of one, or subdividing an
455
+ organization, or merging organizations. If propagation of a covered
456
+ work results from an entity transaction, each party to that
457
+ transaction who receives a copy of the work also receives whatever
458
+ licenses to the work the party's predecessor in interest had or could
459
+ give under the previous paragraph, plus a right to possession of the
460
+ Corresponding Source of the work from the predecessor in interest, if
461
+ the predecessor has it or can get it with reasonable efforts.
462
+
463
+ You may not impose any further restrictions on the exercise of the
464
+ rights granted or affirmed under this License. For example, you may
465
+ not impose a license fee, royalty, or other charge for exercise of
466
+ rights granted under this License, and you may not initiate litigation
467
+ (including a cross-claim or counterclaim in a lawsuit) alleging that
468
+ any patent claim is infringed by making, using, selling, offering for
469
+ sale, or importing the Program or any portion of it.
470
+
471
+ 11. Patents.
472
+
473
+ A "contributor" is a copyright holder who authorizes use under this
474
+ License of the Program or a work on which the Program is based. The
475
+ work thus licensed is called the contributor's "contributor version".
476
+
477
+ A contributor's "essential patent claims" are all patent claims
478
+ owned or controlled by the contributor, whether already acquired or
479
+ hereafter acquired, that would be infringed by some manner, permitted
480
+ by this License, of making, using, or selling its contributor version,
481
+ but do not include claims that would be infringed only as a
482
+ consequence of further modification of the contributor version. For
483
+ purposes of this definition, "control" includes the right to grant
484
+ patent sublicenses in a manner consistent with the requirements of
485
+ this License.
486
+
487
+ Each contributor grants you a non-exclusive, worldwide, royalty-free
488
+ patent license under the contributor's essential patent claims, to
489
+ make, use, sell, offer for sale, import and otherwise run, modify and
490
+ propagate the contents of its contributor version.
491
+
492
+ In the following three paragraphs, a "patent license" is any express
493
+ agreement or commitment, however denominated, not to enforce a patent
494
+ (such as an express permission to practice a patent or covenant not to
495
+ sue for patent infringement). To "grant" such a patent license to a
496
+ party means to make such an agreement or commitment not to enforce a
497
+ patent against the party.
498
+
499
+ If you convey a covered work, knowingly relying on a patent license,
500
+ and the Corresponding Source of the work is not available for anyone
501
+ to copy, free of charge and under the terms of this License, through a
502
+ publicly available network server or other readily accessible means,
503
+ then you must either (1) cause the Corresponding Source to be so
504
+ available, or (2) arrange to deprive yourself of the benefit of the
505
+ patent license for this particular work, or (3) arrange, in a manner
506
+ consistent with the requirements of this License, to extend the patent
507
+ license to downstream recipients. "Knowingly relying" means you have
508
+ actual knowledge that, but for the patent license, your conveying the
509
+ covered work in a country, or your recipient's use of the covered work
510
+ in a country, would infringe one or more identifiable patents in that
511
+ country that you have reason to believe are valid.
512
+
513
+ If, pursuant to or in connection with a single transaction or
514
+ arrangement, you convey, or propagate by procuring conveyance of, a
515
+ covered work, and grant a patent license to some of the parties
516
+ receiving the covered work authorizing them to use, propagate, modify
517
+ or convey a specific copy of the covered work, then the patent license
518
+ you grant is automatically extended to all recipients of the covered
519
+ work and works based on it.
520
+
521
+ A patent license is "discriminatory" if it does not include within
522
+ the scope of its coverage, prohibits the exercise of, or is
523
+ conditioned on the non-exercise of one or more of the rights that are
524
+ specifically granted under this License. You may not convey a covered
525
+ work if you are a party to an arrangement with a third party that is
526
+ in the business of distributing software, under which you make payment
527
+ to the third party based on the extent of your activity of conveying
528
+ the work, and under which the third party grants, to any of the
529
+ parties who would receive the covered work from you, a discriminatory
530
+ patent license (a) in connection with copies of the covered work
531
+ conveyed by you (or copies made from those copies), or (b) primarily
532
+ for and in connection with specific products or compilations that
533
+ contain the covered work, unless you entered into that arrangement,
534
+ or that patent license was granted, prior to 28 March 2007.
535
+
536
+ Nothing in this License shall be construed as excluding or limiting
537
+ any implied license or other defenses to infringement that may
538
+ otherwise be available to you under applicable patent law.
539
+
540
+ 12. No Surrender of Others' Freedom.
541
+
542
+ If conditions are imposed on you (whether by court order, agreement or
543
+ otherwise) that contradict the conditions of this License, they do not
544
+ excuse you from the conditions of this License. If you cannot convey a
545
+ covered work so as to satisfy simultaneously your obligations under this
546
+ License and any other pertinent obligations, then as a consequence you may
547
+ not convey it at all. For example, if you agree to terms that obligate you
548
+ to collect a royalty for further conveying from those to whom you convey
549
+ the Program, the only way you could satisfy both those terms and this
550
+ License would be to refrain entirely from conveying the Program.
551
+
552
+ 13. Use with the GNU Affero General Public License.
553
+
554
+ Notwithstanding any other provision of this License, you have
555
+ permission to link or combine any covered work with a work licensed
556
+ under version 3 of the GNU Affero General Public License into a single
557
+ combined work, and to convey the resulting work. The terms of this
558
+ License will continue to apply to the part which is the covered work,
559
+ but the special requirements of the GNU Affero General Public License,
560
+ section 13, concerning interaction through a network will apply to the
561
+ combination as such.
562
+
563
+ 14. Revised Versions of this License.
564
+
565
+ The Free Software Foundation may publish revised and/or new versions of
566
+ the GNU General Public License from time to time. Such new versions will
567
+ be similar in spirit to the present version, but may differ in detail to
568
+ address new problems or concerns.
569
+
570
+ Each version is given a distinguishing version number. If the
571
+ Program specifies that a certain numbered version of the GNU General
572
+ Public License "or any later version" applies to it, you have the
573
+ option of following the terms and conditions either of that numbered
574
+ version or of any later version published by the Free Software
575
+ Foundation. If the Program does not specify a version number of the
576
+ GNU General Public License, you may choose any version ever published
577
+ by the Free Software Foundation.
578
+
579
+ If the Program specifies that a proxy can decide which future
580
+ versions of the GNU General Public License can be used, that proxy's
581
+ public statement of acceptance of a version permanently authorizes you
582
+ to choose that version for the Program.
583
+
584
+ Later license versions may give you additional or different
585
+ permissions. However, no additional obligations are imposed on any
586
+ author or copyright holder as a result of your choosing to follow a
587
+ later version.
588
+
589
+ 15. Disclaimer of Warranty.
590
+
591
+ THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592
+ APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593
+ HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594
+ OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595
+ THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596
+ PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597
+ IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598
+ ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599
+
600
+ 16. Limitation of Liability.
601
+
602
+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603
+ WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604
+ THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605
+ GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606
+ USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607
+ DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608
+ PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609
+ EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610
+ SUCH DAMAGES.
611
+
612
+ 17. Interpretation of Sections 15 and 16.
613
+
614
+ If the disclaimer of warranty and limitation of liability provided
615
+ above cannot be given local legal effect according to their terms,
616
+ reviewing courts shall apply local law that most closely approximates
617
+ an absolute waiver of all civil liability in connection with the
618
+ Program, unless a warranty or assumption of liability accompanies a
619
+ copy of the Program in return for a fee.
620
+
621
+ END OF TERMS AND CONDITIONS
622
+
623
+ How to Apply These Terms to Your New Programs
624
+
625
+ If you develop a new program, and you want it to be of the greatest
626
+ possible use to the public, the best way to achieve this is to make it
627
+ free software which everyone can redistribute and change under these terms.
628
+
629
+ To do so, attach the following notices to the program. It is safest
630
+ to attach them to the start of each source file to most effectively
631
+ state the exclusion of warranty; and each file should have at least
632
+ the "copyright" line and a pointer to where the full notice is found.
633
+
634
+ <one line to give the program's name and a brief idea of what it does.>
635
+ Copyright (C) <year> <name of author>
636
+
637
+ This program is free software: you can redistribute it and/or modify
638
+ it under the terms of the GNU General Public License as published by
639
+ the Free Software Foundation, either version 3 of the License, or
640
+ (at your option) any later version.
641
+
642
+ This program is distributed in the hope that it will be useful,
643
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
644
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
645
+ GNU General Public License for more details.
646
+
647
+ You should have received a copy of the GNU General Public License
648
+ along with this program. If not, see <https://www.gnu.org/licenses/>.
649
+
650
+ Also add information on how to contact you by electronic and paper mail.
651
+
652
+ If the program does terminal interaction, make it output a short
653
+ notice like this when it starts in an interactive mode:
654
+
655
+ <program> Copyright (C) <year> <name of author>
656
+ This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657
+ This is free software, and you are welcome to redistribute it
658
+ under certain conditions; type `show c' for details.
659
+
660
+ The hypothetical commands `show w' and `show c' should show the appropriate
661
+ parts of the General Public License. Of course, your program's commands
662
+ might be different; for a GUI interface, you would use an "about box".
663
+
664
+ You should also get your employer (if you work as a programmer) or school,
665
+ if any, to sign a "copyright disclaimer" for the program, if necessary.
666
+ For more information on this, and how to apply and follow the GNU GPL, see
667
+ <https://www.gnu.org/licenses/>.
668
+
669
+ The GNU General Public License does not permit incorporating your program
670
+ into proprietary programs. If your program is a subroutine library, you
671
+ may consider it more useful to permit linking proprietary applications with
672
+ the library. If this is what you want to do, use the GNU Lesser General
673
+ Public License instead of this License. But first, please read
674
+ <https://www.gnu.org/licenses/why-not-lgpl.html>.
README.md ADDED
@@ -0,0 +1,297 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: LightDiffusion-Next
3
+ emoji: 🚀
4
+ colorFrom: blue
5
+ colorTo: gray
6
+ sdk: gradio
7
+ sdk_version: 5.33.2
8
+ app_file: app.py
9
+ python_version: 3.10.13
10
+ ---
11
+
12
+ <div align="center">
13
+
14
+ # Say hi to LightDiffusion-Next 👋
15
+
16
+ [![demo platform](https://img.shields.io/badge/Play%20with%20LightDiffusion%21-LightDiffusion%20demo%20platform-lightblue)](https://huggingface.co/spaces/Aatricks/LightDiffusion-Next)&nbsp;
17
+
18
+ **LightDiffusion-Next** is the fastest AI-powered image generation WebUI, combining speed, precision, and flexibility in one cohesive tool.
19
+ </br>
20
+ </br>
21
+ <a href="https://github.com/LightDiffusion/LightDiffusion-Next">
22
+ <img src="https://github.com/user-attachments/assets/b994fe0d-3a2e-44ff-93a4-46919cf865e3" alt="Logo">
23
+
24
+ </a>
25
+ </br>
26
+ </div>
27
+
28
+ ---
29
+
30
+ As a refactored and improved version of the original [LightDiffusion repository](https://github.com/Aatrick/LightDiffusion), this project enhances usability, maintainability, and functionality while introducing a host of new features to streamline your creative workflows.
31
+
32
+
33
+ ## Motivation:
34
+
35
+ **LightDiffusion** was originally meant to be made in Rust, but due to the lack of support for the Rust language in the AI community, it was made in Python with the goal of being the simplest and fastest AI image generation tool.
36
+
37
+ That's when the first version of LightDiffusion was born which only counted [3000 lines of code](https://github.com/LightDiffusion/LightDiffusion-original), only using Pytorch. With time, the [project](https://github.com/Aatrick/LightDiffusion) grew and became more complex, and the need for a refactor was evident. This is where **LightDiffusion-Next** comes in, with a more modular and maintainable codebase, and a plethora of new features and optimizations.
38
+
39
+ 📚 Learn more in the [official documentation](https://aatricks.github.io/LightDiffusion-Next/)
40
+
41
+ For a source-based breakdown of the optimization stack, see the [Implemented Optimizations Report](https://aatricks.github.io/LightDiffusion-Next/implemented-optimizations-report/).
42
+
43
+ ---
44
+
45
+ ## 🌟 Highlights
46
+
47
+ ![image](https://github.com/user-attachments/assets/b994fe0d-3a2e-44ff-93a4-46919cf865e3)
48
+
49
+ **LightDiffusion-Next** offers a powerful suite of tools to cater to creators at every level. At its core, it supports **Text-to-Image** (Txt2Img) and **Image-to-Image** (Img2Img) generation, offering a variety of upscale methods and samplers, to make it easier to create stunning images with minimal effort.
50
+
51
+ Advanced users can take advantage of features like **attention syntax**, **Hires-Fix** or **ADetailer**. These tools provide better quality and flexibility for generating complex and high-resolution outputs.
52
+
53
+ **LightDiffusion-Next** is fine-tuned for **performance**. Features such as **Xformers** acceleration, **BFloat16** precision support, **WaveSpeed** dynamic caching, **Multi-scale diffusion**, and **Stable-Fast** model compilation (which offers up to a 70% speed boost) ensure smooth and efficient operation, even on demanding workloads.
54
+
55
+ ---
56
+
57
+ ## ✨ Feature Showcase
58
+
59
+ Here’s what makes LightDiffusion-Next stand out:
60
+
61
+ - **Speed and Efficiency**:
62
+ Enjoy industry-leading performance with built-in Xformers, Pytorch, Wavespeed and Stable-Fast optimizations, Multi-scale diffusion, deepcache, AYS (Align Your Steps) scheduler, and automatic prompt caching achieving 30% up to 200% faster speeds compared to the rest of the AI image generation backends in SD1.5 and Flux.
63
+
64
+ - **Automatic Detailing**:
65
+ Effortlessly enhance faces and body details with AI-driven tools based on the [Impact Pack](https://github.com/ltdrdata/ComfyUI-Impact-Pack).
66
+
67
+ - **State Preservation**:
68
+ Save and resume your progress with saved states, ensuring seamless transitions between sessions.
69
+
70
+ - **Integration-Ready**:
71
+ Collaborate and create directly in Discord with [Boubou](https://github.com/Aatrick/Boubou), or preview images dynamically with the optional **TAESD preview mode**.
72
+
73
+ - **Image Previewing**:
74
+ Get a real-time preview of your generated images with TAESD, allowing for user-friendly and interactive workflows.
75
+
76
+ - **Image Upscaling**:
77
+ Enhance your images with advanced upscaling options like UltimateSDUpscaling, ensuring high-quality results every time.
78
+
79
+ - **Prompt Refinement**:
80
+ Use the optional Ollama-powered prompt enhancer (defaults to `qwen3:0.6b`) to refine your prompts and generate more accurate and detailed outputs.
81
+
82
+ - **LoRa and Textual Inversion Embeddings**:
83
+ Leverage LoRa and textual inversion embeddings for highly customized and nuanced results, adding a new dimension to your creative process.
84
+
85
+ - **Low-End Device Support**:
86
+ Run LightDiffusion-Next on low-end devices with as little as 2GB of VRAM or even no GPU, ensuring accessibility for all users.
87
+
88
+ - **CFG++**:
89
+ Uses samplers modified to use CFG++ for better quality results compared to traditional methods.
90
+
91
+ - **Newelle Extension**:
92
+ LightDiffusion-Next is also available as a backend to the [Newelle LightDiffusion extension](https://github.com/Aatricks/Newelle-Light-Diffusion) permitting to generate images inline during conversations with llms.
93
+
94
+ ---
95
+
96
+ ## ⚡ Performance Benchmarks
97
+
98
+ **LightDiffusion-Next** dominates in performance:
99
+
100
+ | **Tool** | **Speed (it/s)** |
101
+ |------------------------------------|------------------|
102
+ | **LightDiffusion with Stable-Fast** | 2.8 |
103
+ | **LightDiffusion** | 1.9 |
104
+ | **ComfyUI** | 1.4 |
105
+ | **SDForge** | 1.3 |
106
+ | **SDWebUI** | 0.9 |
107
+
108
+ (All benchmarks are based on a 1024x1024 resolution with a batch size of 1 using BFloat16 precision without tweaking installations. Made with a 3060 mobile GPU using SD1.5.)
109
+
110
+ With its unmatched speed and efficiency, LightDiffusion-Next sets the benchmark for AI image generation tools.
111
+
112
+ ---
113
+
114
+ ## 🛠 Installation
115
+
116
+ > [!NOTE]
117
+ > **Platform Support:** LightDiffusion-Next supports NVIDIA GPUs (CUDA), AMD GPUs (ROCm), and Apple Silicon (Metal/MPS). For AMD and Apple Silicon setup instructions, see the [ROCm and Metal/MPS Support Guide](https://aatrick.github.io/LightDiffusion/rocm-metal-support/).
118
+
119
+ > [!WARNING]
120
+ > **Disclaimer:** On Linux, the fastest way to get started is with the Docker setup below. Windows users often encounter an `EOF` build error when using Docker; if that happens, set up a local virtual environment instead and install SageAttention inside it.
121
+
122
+ > [!NOTE]
123
+ > You will need to download the [flux vae](https://huggingface.co/black-forest-labs/FLUX.1-schnell/blob/main/ae.safetensors) separately given its gated repo on Huggingface. Drop it in the `/include/vae` folder.
124
+
125
+ ### Quick Start
126
+
127
+ 1. Download a release or clone this repository.
128
+ 2. Run `run.bat` in a terminal.
129
+ 3. The modern React frontend will launch automatically at `http://localhost:5173` (proxied to the FastAPI backend at `http://localhost:7861`).
130
+
131
+ **Recommended Launch Command:**
132
+ ```bash
133
+ # Start both backend and frontend development server
134
+ python server.py --frontend
135
+ ```
136
+
137
+ **Production-style local run:**
138
+ ```bash
139
+ # Serve the built React UI from FastAPI on a single port
140
+ python server.py --port 7860
141
+ ```
142
+
143
+ **ZeroGPU / Gradio launch:**
144
+ ```bash
145
+ # Launch the Hugging Face ZeroGPU-compatible Gradio UI
146
+ python app.py
147
+ ```
148
+
149
+ ### 🌌 Flux Support
150
+
151
+ LightDiffusion-Next now features first-class support for **Flux2 Klein**. To get started, you need to download the required model components (Diffusion Model, Text Encoder, and VAE).
152
+
153
+ We provide a convenient script to handle this automatically:
154
+ ```bash
155
+ python download_flux.py
156
+ ```
157
+ This will download approximately 16GB of weights into the `include/` directory.
158
+
159
+ ### 🤗 ZeroGPU / Gradio Space
160
+
161
+ This repository now includes a Gradio `app.py` entrypoint for Hugging Face
162
+ **ZeroGPU**. ZeroGPU is only supported for Gradio SDK Spaces, and the
163
+ GPU-bound generation function is wrapped with `@spaces.GPU`.
164
+
165
+ Recommended defaults for ZeroGPU:
166
+ - keep `Keep Models Loaded` disabled
167
+ - use 512x512 or 768x768 resolutions
168
+ - generate 1 image at a time
169
+ - prefer 10-25 steps with `ays`
170
+
171
+ ### 🐳 Docker Setup
172
+
173
+ Run LightDiffusion-Next in a containerized environment with GPU acceleration.
174
+ The Docker path remains available for local or dedicated GPU deployments and
175
+ serves the built React frontend from the FastAPI backend on port `7860`.
176
+
177
+ > [!IMPORTANT]
178
+ > Confirm you have Docker Desktop configured with the NVIDIA Container Toolkit and at least 12-16GB of memory. Builds expect an NVIDIA GPU with compute capability 8.0 or higher and CUDA 12.0+ support for SageAttention/SpargeAttn.
179
+
180
+ **Quick Start with Docker:**
181
+ ```bash
182
+ # Build and run with docker-compose
183
+ docker-compose up --build
184
+
185
+ # Or build and run manually
186
+ docker build -t lightdiffusion-next .
187
+ docker run --gpus all -p 7860:7860 -e PORT=7860 -v ./output:/app/output lightdiffusion-next
188
+ ```
189
+
190
+ **Custom GPU Architecture (Optional):**
191
+ ```bash
192
+ # For faster builds, specify your GPU architecture (e.g., RTX 5060 = 12.0)
193
+ docker-compose build --build-arg TORCH_CUDA_ARCH_LIST="12.0"
194
+
195
+ # Default builds for: 8.0 (A100), 8.6 (RTX 30xx), 8.9 (RTX 40xx), 9.0 (H100), 12.0 (RTX 50xx)
196
+ ```
197
+
198
+ **Built-in Optimizations:**
199
+ The Docker image can optionally build the following acceleration paths:
200
+ - ✨ **SageAttention** - 15% speedup with INT8 quantization (all supported GPUs)
201
+ - 🚀 **SpargeAttn** - 40-60% speedup with sparse attention (compute 8.0-9.0 only)
202
+ - ⚡ **Stable-Fast** - Optional UNet compilation for up to 70% faster SD1.5 inference
203
+
204
+ Control them through build arguments (defaults shown below):
205
+
206
+ ```bash
207
+ docker-compose build \
208
+ --build-arg TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0;12.0" \
209
+ --build-arg INSTALL_SAGEATTENTION=0 \
210
+ --build-arg INSTALL_SPARGEATTN=0 \
211
+ --build-arg INSTALL_STABLE_FAST=1 \
212
+ --build-arg INSTALL_OLLAMA=0
213
+ ```
214
+
215
+ Set `INSTALL_STABLE_FAST=1` to enable stable-fast, `INSTALL_SAGEATTENTION=1`
216
+ or `INSTALL_SPARGEATTN=1` to opt into the heavier attention-kernel builds, and
217
+ `INSTALL_OLLAMA=1` to bake in the prompt enhancer runtime.
218
+
219
+ > [!NOTE]
220
+ > RTX 50 series (compute 12.0) GPUs currently use SageAttention when the SageAttention kernel is installed. SpargeAttn remains limited to earlier supported architectures.
221
+
222
+ **Access the Web Interface:**
223
+ - **FastAPI + React UI**: `http://localhost:7860`
224
+
225
+ **Volume Mounts:**
226
+ - `./output:/app/output` - Persist generated images
227
+ - `./checkpoints:/app/include/checkpoints` - Store model files
228
+ - `./loras:/app/include/loras` - Store LoRA files
229
+ - `./embeddings:/app/include/embeddings` - Store embeddings
230
+
231
+ ### Advanced Setup
232
+
233
+ - **Install from Source**:
234
+ Install dependencies via:
235
+ ```bash
236
+ pip install -r requirements.txt
237
+ ```
238
+ Add your SD1/1.5 safetensors model to the `checkpoints` directory, then launch the application.
239
+
240
+ - **⚡Stable-Fast Optimization**:
241
+ Follow [this guide](https://github.com/chengzeyi/stable-fast?tab=readme-ov-file#installation) to enable Stable-Fast mode for optimal performance.
242
+ In Docker environments, set `INSTALL_STABLE_FAST=1` to compile it during the image build or `INSTALL_STABLE_FAST=0` (default) to skip.
243
+
244
+ - **🚀 SageAttention & SpargeAttn Acceleration**:
245
+ Boost inference speed by up to 60% with advanced attention backends:
246
+
247
+ **Prerequisites:**
248
+ - [CUDA toolkit](https://developer.nvidia.com/cuda-toolkit-archive) installed with version compatible with your PyTorch installation
249
+
250
+ **SageAttention (15% speedup, Windows compatible):**
251
+ ```bash
252
+ cd SageAttention
253
+ pip install -e . --no-build-isolation
254
+ ```
255
+
256
+ **SpargeAttn (40-60% total speedup, requires WSL2/Linux):**
257
+ > [!CAUTION]
258
+ > SpargeAttn cannot be built with the default Windows linker. Use WSL2 or a native Linux environment and set the correct `TORCH_CUDA_ARCH_LIST` before installation.
259
+ ```bash
260
+ # On WSL2 or Linux only (Windows linker has path length limitations)
261
+ cd SpargeAttn
262
+ export TORCH_CUDA_ARCH_LIST="9.0" # Or your GPU architecture (8.0, 8.6, 8.9, 9.0)
263
+ pip install -e . --no-build-isolation
264
+ ```
265
+
266
+ **Priority System:** SpargeAttn > SageAttention > PyTorch SDPA
267
+ - Both are automatically detected and used when available
268
+ - Graceful fallback for unsupported head dimensions
269
+
270
+ - **🦙 Prompt Enhancer**:
271
+ Turn on the Ollama-backed enhancer to automatically restructure prompts. By default the app targets `qwen3:0.6b`:
272
+ ```bash
273
+ # Local install
274
+ pip install ollama
275
+ curl -fsSL https://ollama.com/install.sh | sh
276
+
277
+ # Start the Ollama daemon (keep this terminal open)
278
+ ollama serve
279
+
280
+ # New terminal: pull the default prompt enhancer model
281
+ ollama pull qwen3:0.6b
282
+ export PROMPT_ENHANCER_MODEL=qwen3:0.6b
283
+ ```
284
+ In Docker builds, set `--build-arg INSTALL_OLLAMA=1` (or update `docker-compose.yml`) to install Ollama and pre-pull the model automatically. You can override the runtime model/prefix with the `PROMPT_ENHANCER_MODEL` and `PROMPT_ENHANCER_PREFIX` environment variables. See the [Ollama guide](https://github.com/ollama/ollama?tab=readme-ov-file) for details.
285
+
286
+ - **🤖 Discord Integration**:
287
+ Set up the Discord bot by following the [Boubou installation guide](https://github.com/Aatrick/Boubou).
288
+
289
+ ### Third-Party Licenses
290
+ - This project distributes builds that depend on third-party open source components. For attribution details and the full license text, refer to `THIRD_PARTY_LICENSES.md`.
291
+
292
+ ---
293
+
294
+ 🎨 Enjoy exploring the powerful features of LightDiffusion-Next!
295
+
296
+ > [!TIP]
297
+ > ⭐ If this project helps you, please give it a star! It helps others discover it too.
THIRD_PARTY_LICENSES.md ADDED
@@ -0,0 +1,948 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Third-Party Notices
2
+
3
+ This project depends on the following third-party components. The notices below satisfy the attribution requirements of their respective licenses.
4
+
5
+ ## SageAttention (thu-ml/SageAttention)
6
+ - Source: https://github.com/thu-ml/SageAttention
7
+ - License: Apache License 2.0 (see full text below)
8
+ - Notes: LightDiffusion-Next applies a build-time patch (`docker/sageattention_setup.patch`) to SageAttention's `setup.py` to honor the `TORCH_CUDA_ARCH_LIST` environment variable during compilation.
9
+
10
+ ## SpargeAttn (thu-ml/SpargeAttn)
11
+ - Source: https://github.com/thu-ml/SpargeAttn
12
+ - License: Apache License 2.0 (see full text below)
13
+ - Notes: Used as provided, without local source modifications.
14
+
15
+ ## ComfyUI (comfyanonymous/ComfyUI)
16
+ - Source: https://github.com/comfyanonymous/ComfyUI
17
+ - License: GNU General Public License v3.0 (full text distributed in the repository root `LICENSE`)
18
+ - Notes: Provides the node-graph runtime and execution engine extended by LightDiffusion-Next.
19
+
20
+ ## ComfyUI Ultimate SD Upscale (ssitu/ComfyUI_UltimateSDUpscale)
21
+ - Source: https://github.com/ssitu/ComfyUI_UltimateSDUpscale
22
+ - License: GNU General Public License v3.0 (full text distributed in the repository root `LICENSE`)
23
+ - Notes: LightDiffusion-Next adapts the Ultimate SD Upscale script to integrate with its sampler interface.
24
+
25
+ ## ADetailer (Bing-su/adetailer)
26
+ - Source: https://github.com/Bing-su/adetailer
27
+ - License: GNU Affero General Public License v3.0 (see full text below)
28
+ - Notes: Supplies detector-driven post-processing for face, hand, and subject refinements. No local source code changes are applied.
29
+
30
+ ## Stable Fast (chengzeyi/stable-fast)
31
+ - Source: https://github.com/chengzeyi/stable-fast
32
+ - License: MIT License (see full text below)
33
+ - Notes: Imported as a wheel distribution to enable graph compilation speedups for Stable Diffusion pipelines.
34
+
35
+ ## ComfyUI-GGUF (city96/comfyui-gguf)
36
+ - Source: https://github.com/city96/comfyui-gguf
37
+ - License: Apache License 2.0 (see full text below)
38
+ - Notes: Provides GGUF model loader nodes used by LightDiffusion-Next without modification.
39
+
40
+ ## WaveSpeed (ComfyUI-WaveSpeed)
41
+ - Source: https://github.com/Fannovel16/ComfyUI-WaveSpeed (original project reference)
42
+ - License: MIT License (see full text below)
43
+ - Notes: LightDiffusion-Next vendors the WaveSpeed caching utilities as-is for first-block cache optimisations.
44
+
45
+ ---
46
+
47
+ ## Apache License
48
+
49
+ ```
50
+ Apache License
51
+ Version 2.0, January 2004
52
+ http://www.apache.org/licenses/
53
+
54
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
55
+
56
+ 1. Definitions.
57
+
58
+ "License" shall mean the terms and conditions for use, reproduction,
59
+ and distribution as defined by Sections 1 through 9 of this document.
60
+
61
+ "Licensor" shall mean the copyright owner or entity authorized by
62
+ the copyright owner that is granting the License.
63
+
64
+ "Legal Entity" shall mean the union of the acting entity and all
65
+ other entities that control, are controlled by, or are under common
66
+ control with that entity. For the purposes of this definition,
67
+ "control" means (i) the power, direct or indirect, to cause the
68
+ direction or management of such entity, whether by contract or
69
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
70
+ outstanding shares, or (iii) beneficial ownership of such entity.
71
+
72
+ "You" (or "Your") shall mean an individual or Legal Entity
73
+ exercising permissions granted by this License.
74
+
75
+ "Source" form shall mean the preferred form for making modifications,
76
+ including but not limited to software source code, documentation
77
+ source, and configuration files.
78
+
79
+ "Object" form shall mean any form resulting from mechanical
80
+ transformation or translation of a Source form, including but
81
+ not limited to compiled object code, generated documentation,
82
+ and conversions to other media types.
83
+
84
+ "Work" shall mean the work of authorship, whether in Source or
85
+ Object form, made available under the License, as indicated by a
86
+ copyright notice that is included in or attached to the work
87
+ (an example is provided in the Appendix below).
88
+
89
+ "Derivative Works" shall mean any work, whether in Source or Object
90
+ form, that is based on (or derived from) the Work and for which the
91
+ editorial revisions, annotations, elaborations, or other modifications
92
+ represent, as a whole, an original work of authorship. For the purposes
93
+ of this License, Derivative Works shall not include works that remain
94
+ separable from, or merely link (or bind by name) to the interfaces of,
95
+ the Work and Derivative Works thereof.
96
+
97
+ "Contribution" shall mean any work of authorship, including
98
+ the original version of the Work and any modifications or additions
99
+ to that Work or Derivative Works thereof, that is intentionally
100
+ submitted to Licensor for inclusion in the Work by the copyright owner
101
+ or by an individual or Legal Entity authorized to submit on behalf of
102
+ the copyright owner. For the purposes of this definition, "submitted"
103
+ means any form of electronic, verbal, or written communication sent
104
+ to the Licensor or its representatives, including but not limited to
105
+ communication on electronic mailing lists, source code control systems,
106
+ and issue tracking systems that are managed by, or on behalf of, the
107
+ Licensor for the purpose of discussing and improving the Work, but
108
+ excluding communication that is conspicuously marked or otherwise
109
+ designated in writing by the copyright owner as "Not a Contribution."
110
+
111
+ "Contributor" shall mean Licensor and any individual or Legal Entity
112
+ on behalf of whom a Contribution has been received by Licensor and
113
+ subsequently incorporated within the Work.
114
+
115
+ 2. Grant of Copyright License. Subject to the terms and conditions of
116
+ this License, each Contributor hereby grants to You a perpetual,
117
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
118
+ copyright license to reproduce, prepare Derivative Works of,
119
+ publicly display, publicly perform, sublicense, and distribute the
120
+ Work and such Derivative Works in Source or Object form.
121
+
122
+ 3. Grant of Patent License. Subject to the terms and conditions of
123
+ this License, each Contributor hereby grants to You a perpetual,
124
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
125
+ (except as stated in this section) patent license to make, have made,
126
+ use, offer to sell, sell, import, and otherwise transfer the Work,
127
+ where such license applies only to those patent claims licensable
128
+ by such Contributor that are necessarily infringed by their
129
+ Contribution(s) alone or by combination of their Contribution(s)
130
+ with the Work to which such Contribution(s) was submitted. If You
131
+ institute patent litigation against any entity (including a
132
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
133
+ or a Contribution incorporated within the Work constitutes direct
134
+ or contributory patent infringement, then any patent licenses
135
+ granted to You under this License for that Work shall terminate
136
+ as of the date such litigation is filed.
137
+
138
+ 4. Redistribution. You may reproduce and distribute copies of the
139
+ Work or Derivative Works thereof in any medium, with or without
140
+ modifications, and in Source or Object form, provided that You
141
+ meet the following conditions:
142
+
143
+ (a) You must give any other recipients of the Work or
144
+ Derivative Works a copy of this License; and
145
+
146
+ (b) You must cause any modified files to carry prominent notices
147
+ stating that You changed the files; and
148
+
149
+ (c) You must retain, in the Source form of any Derivative Works
150
+ that You distribute, all copyright, patent, trademark, and
151
+ attribution notices from the Source form of the Work,
152
+ excluding those notices that do not pertain to any part of
153
+ the Derivative Works; and
154
+
155
+ (d) If the Work includes a "NOTICE" text file as part of its
156
+ distribution, then any Derivative Works that You distribute must
157
+ include a readable copy of the attribution notices contained
158
+ within such NOTICE file, excluding those notices that do not
159
+ pertain to any part of the Derivative Works, in at least one
160
+ of the following places: within a NOTICE text file distributed
161
+ as part of the Derivative Works; within the Source form or
162
+ documentation, if provided along with the Derivative Works; or,
163
+ within a display generated by the Derivative Works, if and
164
+ wherever such third-party notices normally appear. The contents
165
+ of the NOTICE file are for informational purposes only and
166
+ do not modify the License. You may add Your own attribution
167
+ notices within Derivative Works that You distribute, alongside
168
+ or as an addendum to the NOTICE text from the Work, provided
169
+ that such additional attribution notices cannot be construed
170
+ as modifying the License.
171
+
172
+ You may add Your own copyright statement to Your modifications and
173
+ may provide additional or different license terms and conditions
174
+ for use, reproduction, or distribution of Your modifications, or
175
+ for any such Derivative Works as a whole, provided Your use,
176
+ reproduction, and distribution of the Work otherwise complies with
177
+ the conditions stated in this License.
178
+
179
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
180
+ any Contribution intentionally submitted for inclusion in the Work
181
+ by You to the Licensor shall be under the terms and conditions of
182
+ this License, without any additional terms or conditions.
183
+ Notwithstanding the above, nothing herein shall supersede or modify
184
+ the terms of any separate license agreement you may have executed
185
+ with Licensor regarding such Contributions.
186
+
187
+ 6. Trademarks. This License does not grant permission to use the trade
188
+ names, trademarks, service marks, or product names of the Licensor,
189
+ except as required for reasonable and customary use in describing the
190
+ origin of the Work and reproducing the content of the NOTICE file.
191
+
192
+ 7. Disclaimer of Warranty. Unless required by applicable law or
193
+ agreed to in writing, Licensor provides the Work (and each
194
+ Contributor provides its Contributions) on an "AS IS" BASIS,
195
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
196
+ implied, including, without limitation, any warranties or conditions
197
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
198
+ PARTICULAR PURPOSE. You are solely responsible for determining the
199
+ appropriateness of using or redistributing the Work and assume any
200
+ risks associated with Your exercise of permissions under this License.
201
+
202
+ 8. Limitation of Liability. In no event and under no legal theory,
203
+ whether in tort (including negligence), contract, or otherwise,
204
+ unless required by applicable law (such as deliberate and grossly
205
+ negligent acts) or agreed to in writing, shall any Contributor be
206
+ liable to You for damages, including any direct, indirect, special,
207
+ incidental, or consequential damages of any character arising as a
208
+ result of this License or out of the use or inability to use the
209
+ Work (including but not limited to damages for loss of goodwill,
210
+ work stoppage, computer failure or malfunction, or any and all
211
+ other commercial damages or losses), even if such Contributor
212
+ has been advised of the possibility of such damages.
213
+
214
+ 9. Accepting Warranty or Additional Liability. While redistributing
215
+ the Work or Derivative Works thereof, You may choose to offer,
216
+ and charge a fee for, acceptance of support, warranty, indemnity,
217
+ or other liability obligations and/or rights consistent with this
218
+ License. However, in accepting such obligations, You may act only
219
+ on Your own behalf and on Your sole responsibility, not on behalf
220
+ of any other Contributor, and only if You agree to indemnify,
221
+ defend, and hold each Contributor harmless for any liability
222
+ incurred by, or claims asserted against, such Contributor by reason
223
+ of your accepting any such warranty or additional liability.
224
+
225
+ END OF TERMS AND CONDITIONS
226
+
227
+ APPENDIX: How to apply the Apache License to your work.
228
+
229
+ To apply the Apache License to your work, attach the following
230
+ boilerplate notice, with the fields enclosed by brackets "[]"
231
+ replaced with your own identifying information. (Don't include
232
+ the brackets!) The text should be enclosed in the appropriate
233
+ comment syntax for the file format. We also recommend that a
234
+ file or class name and description of purpose be included on the
235
+ same "printed page" as the copyright notice for easier
236
+ identification within third-party archives.
237
+
238
+ Copyright [yyyy] [name of copyright owner]
239
+
240
+ Licensed under the Apache License, Version 2.0 (the "License");
241
+ you may not use this file except in compliance with the License.
242
+ You may obtain a copy of the License at
243
+
244
+ http://www.apache.org/licenses/LICENSE-2.0
245
+
246
+ Unless required by applicable law or agreed to in writing, software
247
+ distributed under the License is distributed on an "AS IS" BASIS,
248
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
249
+ See the License for the specific language governing permissions and
250
+ limitations under the License.
251
+ ```
252
+
253
+ ---
254
+
255
+ ## MIT License (Stable Fast)
256
+
257
+ ```
258
+ MIT License
259
+
260
+ Copyright (c) 2023 C
261
+
262
+ Permission is hereby granted, free of charge, to any person obtaining a copy
263
+ of this software and associated documentation files (the "Software"), to deal
264
+ in the Software without restriction, including without limitation the rights
265
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
266
+ copies of the Software, and to permit persons to whom the Software is
267
+ furnished to do so, subject to the following conditions:
268
+
269
+ The above copyright notice and this permission notice shall be included in all
270
+ copies or substantial portions of the Software.
271
+
272
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
273
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
274
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
275
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
276
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
277
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
278
+ SOFTWARE.
279
+ ```
280
+
281
+ ---
282
+
283
+ ## GNU Affero General Public License v3.0 (ADetailer)
284
+
285
+ ```
286
+ GNU AFFERO GENERAL PUBLIC LICENSE
287
+ Version 3, 19 November 2007
288
+
289
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
290
+ Everyone is permitted to copy and distribute verbatim copies
291
+ of this license document, but changing it is not allowed.
292
+
293
+ Preamble
294
+
295
+ The GNU Affero General Public License is a free, copyleft license for
296
+ software and other kinds of works, specifically designed to ensure
297
+ cooperation with the community in the case of network server software.
298
+
299
+ The licenses for most software and other practical works are designed
300
+ to take away your freedom to share and change the works. By contrast,
301
+ our General Public Licenses are intended to guarantee your freedom to
302
+ share and change all versions of a program--to make sure it remains free
303
+ software for all its users.
304
+
305
+ When we speak of free software, we are referring to freedom, not
306
+ price. Our General Public Licenses are designed to make sure that you
307
+ have the freedom to distribute copies of free software (and charge for
308
+ them if you wish), that you receive source code or can get it if you
309
+ want it, that you can change the software or use pieces of it in new
310
+ free programs, and that you know you can do these things.
311
+
312
+ Developers that use our General Public Licenses protect your rights
313
+ with two steps: (1) assert copyright on the software, and (2) offer
314
+ you this License which gives you legal permission to copy, distribute
315
+ and/or modify the software.
316
+
317
+ A secondary benefit of defending all users' freedom is that
318
+ improvements made in alternate versions of the program, if they
319
+ receive widespread use, become available for other developers to
320
+ incorporate. Many developers of free software are heartened and
321
+ encouraged by the resulting cooperation. However, in the case of
322
+ software used on network servers, this result may fail to come about.
323
+ The GNU General Public License permits making a modified version and
324
+ letting the public access it on a server without ever releasing its
325
+ source code to the public.
326
+
327
+ The GNU Affero General Public License is designed specifically to
328
+ ensure that, in such cases, the modified source code becomes available
329
+ to the community. It requires the operator of a network server to
330
+ provide the source code of the modified version running there to the
331
+ users of that server. Therefore, public use of a modified version, on
332
+ a publicly accessible server, gives the public access to the source
333
+ code of the modified version.
334
+
335
+ An older license, called the Affero General Public License and
336
+ published by Affero, was designed to accomplish similar goals. This is
337
+ a different license, not a version of the Affero GPL, but Affero has
338
+ released a new version of the Affero GPL which permits relicensing under
339
+ this license.
340
+
341
+ The precise terms and conditions for copying, distribution and
342
+ modification follow.
343
+
344
+ TERMS AND CONDITIONS
345
+
346
+ 0. Definitions.
347
+
348
+ "This License" refers to version 3 of the GNU Affero General Public License.
349
+
350
+ "Copyright" also means copyright-like laws that apply to other kinds of
351
+ works, such as semiconductor masks.
352
+
353
+ "The Program" refers to any copyrightable work licensed under this
354
+ License. Each licensee is addressed as "you". "Licensees" and
355
+ "recipients" may be individuals or organizations.
356
+
357
+ To "modify" a work means to copy from or adapt all or part of the work
358
+ in a fashion requiring copyright permission, other than the making of an
359
+ exact copy. The resulting work is called a "modified version" of the
360
+ earlier work or a work "based on" the earlier work.
361
+
362
+ A "covered work" means either the unmodified Program or a work based
363
+ on the Program.
364
+
365
+ To "propagate" a work means to do anything with it that, without
366
+ permission, would make you directly or secondarily liable for
367
+ infringement under applicable copyright law, except executing it on a
368
+ computer or modifying a private copy. Propagation includes copying,
369
+ distribution (with or without modification), making available to the
370
+ public, and in some countries other activities as well.
371
+
372
+ To "convey" a work means any kind of propagation that enables other
373
+ parties to make or receive copies. Mere interaction with a user through
374
+ a computer network, with no transfer of a copy, is not conveying.
375
+
376
+ An interactive user interface displays "Appropriate Legal Notices"
377
+ to the extent that it includes a convenient and prominently visible
378
+ feature that (1) displays an appropriate copyright notice, and (2)
379
+ tells the user that there is no warranty for the work (except to the
380
+ extent that warranties are provided), that licensees may convey the
381
+ work under this License, and how to view a copy of this License. If
382
+ the interface presents a list of user commands or options, such as a
383
+ menu, a prominent item in the list meets this criterion.
384
+
385
+ 1. Source Code.
386
+
387
+ The "source code" for a work means the preferred form of the work
388
+ for making modifications to it. "Object code" means any non-source
389
+ form of a work.
390
+
391
+ A "Standard Interface" means an interface that either is an official
392
+ standard defined by a recognized standards body, or, in the case of
393
+ interfaces specified for a particular programming language, one that
394
+ is widely used among developers working in that language.
395
+
396
+ The "System Libraries" of an executable work include anything, other
397
+ than the work as a whole, that (a) is included in the normal form of
398
+ packaging a Major Component, but which is not part of that Major
399
+ Component, and (b) serves only to enable use of the work with that
400
+ Major Component, or to implement a Standard Interface for which an
401
+ implementation is available to the public in source code form. A
402
+ "Major Component", in this context, means a major essential component
403
+ (kernel, window system, and so on) of the specific operating system
404
+ (if any) on which the executable work runs, or a compiler used to
405
+ produce the work, or an object code interpreter used to run it.
406
+
407
+ The "Corresponding Source" for a work in object code form means all
408
+ the source code needed to generate, install, and (for an executable
409
+ work) run the object code and to modify the work, including scripts to
410
+ control those activities. However, it does not include the work's
411
+ System Libraries, or general-purpose tools or generally available free
412
+ programs which are used unmodified in performing those activities but
413
+ which are not part of the work. For example, Corresponding Source
414
+ includes interface definition files associated with source files for
415
+ the work, and the source code for shared libraries and dynamically
416
+ linked subprograms that the work is specifically designed to require,
417
+ such as by intimate data communication or control flow between those
418
+ subprograms and other parts of the work.
419
+
420
+ The Corresponding Source need not include anything that users
421
+ can regenerate automatically from other parts of the Corresponding
422
+ Source.
423
+
424
+ The Corresponding Source for a work in source code form is that
425
+ same work.
426
+
427
+ 2. Basic Permissions.
428
+
429
+ All rights granted under this License are granted for the term of
430
+ copyright on the Program, and are irrevocable provided the stated
431
+ conditions are met. This License explicitly affirms your unlimited
432
+ permission to run the unmodified Program. The output from running a
433
+ covered work is covered by this License only if the output, given its
434
+ content, constitutes a covered work. This License acknowledges your
435
+ rights of fair use or other equivalent, as provided by copyright law.
436
+
437
+ You may make, run and propagate covered works that you do not
438
+ convey, without conditions so long as your license otherwise remains
439
+ in force. You may convey covered works to others for the sole purpose
440
+ of having them make modifications exclusively for you, or provide you
441
+ with facilities for running those works, provided that you comply with
442
+ the terms of this License in conveying all material for which you do
443
+ not control copyright. Those thus making or running the covered works
444
+ for you must do so exclusively on your behalf, under your direction
445
+ and control, on terms that prohibit them from making any copies of
446
+ your copyrighted material outside their relationship with you.
447
+
448
+ Conveying under any other circumstances is permitted solely under
449
+ the conditions stated below. Sublicensing is not allowed; section 10
450
+ makes it unnecessary.
451
+
452
+ 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
453
+
454
+ No covered work shall be deemed part of an effective technological
455
+ measure under any applicable law fulfilling obligations under article
456
+ 11 of the WIPO copyright treaty adopted on 20 December 1996, or
457
+ similar laws prohibiting or restricting circumvention of such
458
+ measures.
459
+
460
+ When you convey a covered work, you waive any legal power to forbid
461
+ circumvention of technological measures to the extent such circumvention
462
+ is effected by exercising rights under this License with respect to
463
+ the covered work, and you disclaim any intention to limit operation or
464
+ modification of the work as a means of enforcing, against the work's
465
+ users, your or third parties' legal rights to forbid circumvention of
466
+ technological measures.
467
+
468
+ 4. Conveying Verbatim Copies.
469
+
470
+ You may convey verbatim copies of the Program's source code as you
471
+ receive it, in any medium, provided that you conspicuously and
472
+ appropriately publish on each copy an appropriate copyright notice;
473
+ keep intact all notices stating that this License and any
474
+ non-permissive terms added in accord with section 7 apply to the code;
475
+ keep intact all notices of the absence of any warranty; and give all
476
+ recipients a copy of this License along with the Program.
477
+
478
+ You may charge any price or no price for each copy that you convey,
479
+ and you may offer support or warranty protection for a fee.
480
+
481
+ 5. Conveying Modified Source Versions.
482
+
483
+ You may convey a work based on the Program, or the modifications to
484
+ produce it from the Program, in the form of source code under the
485
+ terms of section 4, provided that you also meet all of these conditions:
486
+
487
+ a) The work must carry prominent notices stating that you modified
488
+ it, and giving a relevant date.
489
+
490
+ b) The work must carry prominent notices stating that it is
491
+ released under this License and any conditions added under section
492
+ 7. This requirement modifies the requirement in section 4 to
493
+ "keep intact all notices".
494
+
495
+ c) You must license the entire work, as a whole, under this
496
+ License to anyone who comes into possession of a copy. This
497
+ License will therefore apply, along with any applicable section 7
498
+ additional terms, to the whole of the work, and all its parts,
499
+ regardless of how they are packaged. This License gives no
500
+ permission to license the work in any other way, but it does not
501
+ invalidate such permission if you have separately received it.
502
+
503
+ d) If the work has interactive user interfaces, each must display
504
+ Appropriate Legal Notices; however, if the Program has interactive
505
+ interfaces that do not display Appropriate Legal Notices, your
506
+ work need not make them do so.
507
+
508
+ A compilation of a covered work with other separate and independent
509
+ works, which are not by their nature extensions of the covered work,
510
+ and which are not combined with it such as to form a larger program,
511
+ in or on a volume of a storage or distribution medium, is called an
512
+ "aggregate" if the compilation and its resulting copyright are not
513
+ used to limit the access or legal rights of the compilation's users
514
+ beyond what the individual works permit. Inclusion of a covered work
515
+ in an aggregate does not cause this License to apply to the other
516
+ parts of the aggregate.
517
+
518
+ 6. Conveying Non-Source Forms.
519
+
520
+ You may convey a covered work in object code form under the terms
521
+ of sections 4 and 5, provided that you also convey the
522
+ machine-readable Corresponding Source under the terms of this License,
523
+ in one of these ways:
524
+
525
+ a) Convey the object code in, or embodied in, a physical product
526
+ (including a physical distribution medium), accompanied by the
527
+ Corresponding Source fixed on a durable physical medium
528
+ customarily used for software interchange.
529
+
530
+ b) Convey the object code in, or embodied in, a physical product
531
+ (including a physical distribution medium), accompanied by a
532
+ written offer, valid for at least three years and valid for as
533
+ long as you offer spare parts or customer support for that product
534
+ model, to give anyone who possesses the object code either (1) a
535
+ copy of the Corresponding Source for all the software in the
536
+ product that is covered by this License, on a durable physical
537
+ medium customarily used for software interchange, for a price no
538
+ more than your reasonable cost of physically performing this
539
+ conveying of source, or (2) access to copy the
540
+ Corresponding Source from a network server at no charge.
541
+
542
+ c) Convey individual copies of the object code with a copy of the
543
+ written offer to provide the Corresponding Source. This
544
+ alternative is allowed only occasionally and noncommercially, and
545
+ only if you received the object code with such an offer, in accord
546
+ with subsection 6b.
547
+
548
+ d) Convey the object code by offering access from a designated
549
+ place (gratis or for a charge), and offer equivalent access to the
550
+ Corresponding Source in the same way through the same place at no
551
+ further charge. You need not require recipients to copy the
552
+ Corresponding Source along with the object code. If the place to
553
+ copy the object code is a network server, the Corresponding Source
554
+ may be on a different server (operated by you or a third party)
555
+ that supports equivalent copying facilities, provided you maintain
556
+ clear directions next to the object code saying where to find the
557
+ Corresponding Source. Regardless of what server hosts the
558
+ Corresponding Source, you remain obligated to ensure that it is
559
+ available for as long as needed to satisfy these requirements.
560
+
561
+ e) Convey the object code using peer-to-peer transmission, provided
562
+ you inform other peers where the object code and Corresponding
563
+ Source of the work are being offered to the general public at no
564
+ charge under subsection 6d.
565
+
566
+ A separable portion of the object code, whose source code is excluded
567
+ from the Corresponding Source as a System Library, need not be
568
+ included in conveying the object code work.
569
+
570
+ A "User Product" is either (1) a "consumer product", which means any
571
+ tangible personal property which is normally used for personal, family,
572
+ or household purposes, or (2) anything designed or sold for incorporation
573
+ into a dwelling. In determining whether a product is a consumer product,
574
+ doubtful cases shall be resolved in favor of coverage. For a particular
575
+ product received by a particular user, "normally used" refers to a
576
+ typical or common use of that class of product, regardless of the status
577
+ of the particular user or of the way in which the particular user
578
+ actually uses, or expects or is expected to use, the product. A product
579
+ is a consumer product regardless of whether the product has substantial
580
+ commercial, industrial or non-consumer uses, unless such uses represent
581
+ the only significant mode of use of the product.
582
+
583
+ "Installation Information" for a User Product means any methods,
584
+ procedures, authorization keys, or other information required to install
585
+ and execute modified versions of a covered work in that User Product from
586
+ a modified version of its Corresponding Source. The information must
587
+ suffice to ensure that the continued functioning of the modified object
588
+ code is in no case prevented or interfered with solely because
589
+ modification has been made.
590
+
591
+ If you convey an object code work under this section in, or with, or
592
+ specifically for use in, a User Product, and the conveying occurs as
593
+ part of a transaction in which the right of possession and use of the
594
+ User Product is transferred to the recipient in perpetuity or for a
595
+ fixed term (regardless of how the transaction is characterized), the
596
+ Corresponding Source conveyed under this section must be accompanied
597
+ by the Installation Information. But this requirement does not apply
598
+ if neither you nor any third party retains the ability to install
599
+ modified object code on the User Product (for example, the work has
600
+ been installed in ROM).
601
+
602
+ The requirement to provide Installation Information does not include a
603
+ requirement to continue to provide support service, warranty, or updates
604
+ for a work that has been modified or installed by the recipient, or for
605
+ the User Product in which it has been modified or installed. Access to a
606
+ network may be denied when the modification itself materially and
607
+ adversely affects the operation of the network or violates the rules and
608
+ protocols for communication across the network.
609
+
610
+ Corresponding Source conveyed, and Installation Information provided,
611
+ in accord with this section must be in a format that is publicly
612
+ documented (and with an implementation available to the public in
613
+ source code form), and must require no special password or key for
614
+ unpacking, reading or copying.
615
+
616
+ 7. Additional Terms.
617
+
618
+ "Additional permissions" are terms that supplement the terms of this
619
+ License by making exceptions from one or more of its conditions.
620
+ Additional permissions that are applicable to the entire Program shall
621
+ be treated as though they were included in this License, to the extent
622
+ that they are valid under applicable law. If additional permissions
623
+ apply only to part of the Program, that part may be used separately
624
+ under those permissions, but the entire Program remains governed by
625
+ this License without regard to the additional permissions.
626
+
627
+ When you convey a copy of a covered work, you may at your option
628
+ remove any additional permissions from that copy, or from any part of
629
+ it. (Additional permissions may be written to require their own
630
+ removal in certain cases when you modify the work.) You may place
631
+ additional permissions on material, added by you to a covered work,
632
+ for which you have or can give appropriate copyright permission.
633
+
634
+ Notwithstanding any other provision of this License, for material you
635
+ add to a covered work, you may (if authorized by the copyright holders of
636
+ that material) supplement the terms of this License with terms:
637
+
638
+ a) Disclaiming warranty or limiting liability differently from the
639
+ terms of sections 15 and 16 of this License; or
640
+
641
+ b) Requiring preservation of specified reasonable legal notices or
642
+ author attributions in that material or in the Appropriate Legal
643
+ Notices displayed by works containing it; or
644
+
645
+ c) Prohibiting misrepresentation of the origin of that material, or
646
+ requiring that modified versions of such material be marked in
647
+ reasonable ways as different from the original version; or
648
+
649
+ d) Limiting the use for publicity purposes of names of licensors or
650
+ authors of the material; or
651
+
652
+ e) Declining to grant rights under trademark law for use of some
653
+ trade names, trademarks, or service marks; or
654
+
655
+ f) Requiring indemnification of licensors and authors of that
656
+ material by anyone who conveys the material (or modified versions of
657
+ it) with contractual assumptions of liability to the recipient, for
658
+ any liability that these contractual assumptions directly impose on
659
+ those licensors and authors.
660
+
661
+ All other non-permissive additional terms are considered "further
662
+ restrictions" within the meaning of section 10. If the Program as you
663
+ received it, or any part of it, contains a notice stating that it is
664
+ governed by this License along with a term that is a further
665
+ restriction, you may remove that term. If a license document contains
666
+ a further restriction but permits relicensing or conveying under this
667
+ License, you may add to a covered work material governed by the terms
668
+ of that license document, provided that the further restriction does
669
+ not survive such relicensing or conveying.
670
+
671
+ If you add terms to a covered work in accord with this section, you
672
+ must place, in the relevant source files, a statement of the
673
+ additional terms that apply to those files, or a notice indicating
674
+ where to find the applicable terms.
675
+
676
+ Additional terms, permissive or non-permissive, may be stated in the
677
+ form of a separately written license, or stated as exceptions;
678
+ the above requirements apply either way.
679
+
680
+ 8. Termination.
681
+
682
+ You may not propagate or modify a covered work except as expressly
683
+ provided under this License. Any attempt otherwise to propagate or
684
+ modify it is void, and will automatically terminate your rights under
685
+ this License (including any patent licenses granted under the third
686
+ paragraph of section 11).
687
+
688
+ However, if you cease all violation of this License, then your
689
+ license from a particular copyright holder is reinstated (a)
690
+ provisionally, unless and until the copyright holder explicitly and
691
+ finally terminates your license, and (b) permanently, if the copyright
692
+ holder fails to notify you of the violation by some reasonable means
693
+ prior to 60 days after the cessation.
694
+
695
+ Moreover, your license from a particular copyright holder is
696
+ reinstated permanently if the copyright holder notifies you of the
697
+ violation by some reasonable means, this is the first time you have
698
+ received notice of violation of this License (for any work) from that
699
+ copyright holder, and you cure the violation prior to 30 days after
700
+ your receipt of the notice.
701
+
702
+ Termination of your rights under this section does not terminate the
703
+ licenses of parties who have received copies or rights from you under
704
+ this License. If your rights have been terminated and not permanently
705
+ reinstated, you do not qualify to receive new licenses for the same
706
+ material under section 10.
707
+
708
+ 9. Acceptance Not Required for Having Copies.
709
+
710
+ You are not required to accept this License in order to receive or
711
+ run a copy of the Program. Ancillary propagation of a covered work
712
+ occurring solely as a consequence of using peer-to-peer transmission
713
+ to receive a copy likewise does not require acceptance. However,
714
+ nothing other than this License grants you permission to propagate or
715
+ modify any covered work. These actions infringe copyright if you do
716
+ not accept this License. Therefore, by modifying or propagating a
717
+ covered work, you indicate your acceptance of this License to do so.
718
+
719
+ 10. Automatic Licensing of Downstream Recipients.
720
+
721
+ Each time you convey a covered work, the recipient automatically
722
+ receives a license from the original licensors, to run, modify and
723
+ propagate that work, subject to this License. You are not responsible
724
+ for enforcing compliance by third parties with this License.
725
+
726
+ An "entity transaction" is a transaction transferring control of an
727
+ organization, or substantially all assets of one, or subdividing an
728
+ organization, or merging organizations. If propagation of a covered
729
+ work results from an entity transaction, each party to that
730
+ transaction who receives a copy of the work also receives whatever
731
+ licenses to the work the party's predecessor in interest had or could
732
+ give under the previous paragraph, plus a right to possession of the
733
+ Corresponding Source of the work from the predecessor in interest, if
734
+ the predecessor has it or can get it with reasonable efforts.
735
+
736
+ You may not impose any further restrictions on the exercise of the
737
+ rights granted or affirmed under this License. For example, you may
738
+ not impose a license fee, royalty, or other charge for exercise of
739
+ rights granted under this License, and you may not initiate litigation
740
+ (including a cross-claim or counterclaim in a lawsuit) alleging that
741
+ any patent claim is infringed by making, using, selling, offering for
742
+ sale, or importing the Program or any portion of it.
743
+
744
+ 11. Patents.
745
+
746
+ A "contributor" is a copyright holder who authorizes use under this
747
+ License of the Program or a work on which the Program is based. The
748
+ work thus licensed is called the contributor's "contributor version".
749
+
750
+ A contributor's "essential patent claims" are all patent claims
751
+ owned or controlled by the contributor, whether already acquired or
752
+ hereafter acquired, that would be infringed by some manner, permitted
753
+ by this License, of making, using, or selling its contributor version,
754
+ but do not include claims that would be infringed only as a
755
+ consequence of further modification of the contributor version. For
756
+ purposes of this definition, "control" includes the right to grant
757
+ patent sublicenses in a manner consistent with the requirements of
758
+ this License.
759
+
760
+ Each contributor grants you a non-exclusive, worldwide, royalty-free
761
+ patent license under the contributor's essential patent claims, to
762
+ make, use, sell, offer for sale, import and otherwise run, modify and
763
+ propagate the contents of its contributor version.
764
+
765
+ In the following three paragraphs, a "patent license" is any express
766
+ agreement or commitment, however denominated, not to enforce a patent
767
+ (such as an express permission to practice a patent or covenant not to
768
+ sue for patent infringement). To "grant" such a patent license to a
769
+ party means to make such an agreement or commitment not to enforce a
770
+ patent against the party.
771
+
772
+ If you convey a covered work, knowingly relying on a patent license,
773
+ and the Corresponding Source of the work is not available for anyone
774
+ to copy, free of charge and under the terms of this License, through a
775
+ publicly available network server or other readily accessible means,
776
+ then you must either (1) cause the Corresponding Source to be so
777
+ available, or (2) arrange to deprive yourself of the benefit of the
778
+ patent license for this particular work, or (3) arrange, in a manner
779
+ consistent with the requirements of this License, to extend the patent
780
+ license to downstream recipients. "Knowingly relying" means you have
781
+ actual knowledge that, but for the patent license, your conveying the
782
+ covered work in a country, or your recipient's use of the covered work
783
+ in a country, would infringe one or more identifiable patents in that
784
+ country that you have reason to believe are valid.
785
+
786
+ If, pursuant to or in connection with a single transaction or
787
+ arrangement, you convey, or propagate by procuring conveyance of, a
788
+ covered work, and grant a patent license to some of the parties
789
+ receiving the covered work authorizing them to use, propagate, modify
790
+ or convey a specific copy of the covered work, then the patent license
791
+ you grant is automatically extended to all recipients of the covered
792
+ work and works based on it.
793
+
794
+ A patent license is "discriminatory" if it does not include within
795
+ the scope of its coverage, prohibits the exercise of, or is
796
+ conditioned on the non-exercise of one or more of the rights that are
797
+ specifically granted under this License. You may not convey a covered
798
+ work if you are a party to an arrangement with a third party that is
799
+ in the business of distributing software, under which you make payment
800
+ to the third party based on the extent of your activity of conveying
801
+ the work, and under which the third party grants, to any of the
802
+ parties who would receive the covered work from you, a discriminatory
803
+ patent license (a) in connection with copies of the covered work
804
+ conveyed by you (or copies made from those copies), or (b) primarily
805
+ for and in connection with specific products or compilations that
806
+ contain the covered work, unless you entered into that arrangement,
807
+ or that patent license was granted, prior to 28 March 2007.
808
+
809
+ Nothing in this License shall be construed as excluding or limiting
810
+ any implied license or other defenses to infringement that may
811
+ otherwise be available to you under applicable patent law.
812
+
813
+ 12. No Surrender of Others' Freedom.
814
+
815
+ If conditions are imposed on you (whether by court order, agreement or
816
+ otherwise) that contradict the conditions of this License, they do not
817
+ excuse you from the conditions of this License. If you cannot convey a
818
+ covered work so as to satisfy simultaneously your obligations under this
819
+ License and any other pertinent obligations, then as a consequence you may
820
+ not convey it at all. For example, if you agree to terms that obligate you
821
+ to collect a royalty for further conveying from those to whom you convey
822
+ the Program, the only way you could satisfy both those terms and this
823
+ License would be to refrain entirely from conveying the Program.
824
+
825
+ 13. Remote Network Interaction; Use with the GNU General Public License.
826
+
827
+ Notwithstanding any other provision of this License, if you modify the
828
+ Program, your modified version must prominently offer all users
829
+ interacting with it remotely through a computer network (if your version
830
+ supports such interaction) an opportunity to receive the Corresponding
831
+ Source of your version by providing access to the Corresponding Source
832
+ from a network server at no charge, through some standard or customary
833
+ means of facilitating copying of software. This Corresponding Source
834
+ shall include the Corresponding Source for any work covered by version 3
835
+ of the GNU General Public License that is incorporated pursuant to the
836
+ following paragraph.
837
+
838
+ Notwithstanding any other provision of this License, you have
839
+ permission to link or combine any covered work with a work licensed
840
+ under version 3 of the GNU General Public License into a single
841
+ combined work, and to convey the resulting work. The terms of this
842
+ License will continue to apply to the part which is the covered work,
843
+ but the work with which it is combined will remain governed by version
844
+ 3 of the GNU General Public License.
845
+
846
+ 14. Revised Versions of this License.
847
+
848
+ The Free Software Foundation may publish revised and/or new versions of
849
+ the GNU Affero General Public License from time to time. Such new versions
850
+ will be similar in spirit to the present version, but may differ in detail to
851
+ address new problems or concerns.
852
+
853
+ Each version is given a distinguishing version number. If the
854
+ Program specifies that a certain numbered version of the GNU Affero General
855
+ Public License "or any later version" applies to it, you have the
856
+ option of following the terms and conditions either of that numbered
857
+ version or of any later version published by the Free Software
858
+ Foundation. If the Program does not specify a version number of the
859
+ GNU Affero General Public License, you may choose any version ever published
860
+ by the Free Software Foundation.
861
+
862
+ If the Program specifies that a proxy can decide which future
863
+ versions of the GNU Affero General Public License can be used, that proxy's
864
+ public statement of acceptance of a version permanently authorizes you
865
+ to choose that version for the Program.
866
+
867
+ Later license versions may give you additional or different
868
+ permissions. However, no additional obligations are imposed on any
869
+ author or copyright holder as a result of your choosing to follow a
870
+ later version.
871
+
872
+ 15. Disclaimer of Warranty.
873
+
874
+ THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
875
+ APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
876
+ HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
877
+ OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
878
+ THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
879
+ PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
880
+ IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
881
+ ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
882
+
883
+ 16. Limitation of Liability.
884
+
885
+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
886
+ WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
887
+ THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
888
+ GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
889
+ USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
890
+ DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
891
+ PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
892
+ EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
893
+ SUCH DAMAGES.
894
+
895
+ 17. Interpretation of Sections 15 and 16.
896
+
897
+ If the disclaimer of warranty and limitation of liability provided
898
+ above cannot be given local legal effect according to their terms,
899
+ reviewing courts shall apply local law that most closely approximates
900
+ an absolute waiver of all civil liability in connection with the
901
+ Program, unless a warranty or assumption of liability accompanies a
902
+ copy of the Program in return for a fee.
903
+
904
+ END OF TERMS AND CONDITIONS
905
+
906
+ How to Apply These Terms to Your New Programs
907
+
908
+ If you develop a new program, and you want it to be of the greatest
909
+ possible use to the public, the best way to achieve this is to make it
910
+ free software which everyone can redistribute and change under these terms.
911
+
912
+ To do so, attach the following notices to the program. It is safest
913
+ to attach them to the start of each source file to most effectively
914
+ state the exclusion of warranty; and each file should have at least
915
+ the "copyright" line and a pointer to where the full notice is found.
916
+
917
+ <one line to give the program's name and a brief idea of what it does.>
918
+ Copyright (C) <year> <name of author>
919
+
920
+ This program is free software: you can redistribute it and/or modify
921
+ it under the terms of the GNU Affero General Public License as published by
922
+ the Free Software Foundation, either version 3 of the License, or
923
+ (at your option) any later version.
924
+
925
+ This program is distributed in the hope that it will be useful,
926
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
927
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
928
+ GNU Affero General Public License for more details.
929
+
930
+ You should have received a copy of the GNU Affero General Public License
931
+ along with this program. If not, see <https://www.gnu.org/licenses/>.
932
+
933
+ Also add information on how to contact you by electronic and paper mail.
934
+
935
+ If your software can interact with users remotely through a computer
936
+ network, you should also make sure that it provides a way for users to
937
+ get its source. For example, if your program is a web application, its
938
+ interface could display a "Source" link that leads users to an archive
939
+ of the code. There are many ways you could offer source, and different
940
+ solutions will be better for different programs; see section 13 for the
941
+ specific requirements.
942
+
943
+ You should also get your employer (if you work as a programmer) or school,
944
+ if any, to sign a "copyright disclaimer" for the program, if necessary.
945
+ For more information on this, and how to apply and follow the GNU AGPL, see
946
+ <https://www.gnu.org/licenses/>.
947
+
948
+ ```
app.py ADDED
@@ -0,0 +1,367 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import glob
4
+ import os
5
+ import time
6
+ import uuid
7
+ from typing import Any, Optional
8
+
9
+ import gradio as gr
10
+ import spaces
11
+ from PIL import Image
12
+
13
+ from src.Core.Models.ModelFactory import list_available_models
14
+ from src.Device.ModelCache import get_model_cache
15
+ from src.user import app_instance
16
+ from src.user.pipeline import pipeline
17
+
18
+
19
+ SCHEDULER_CHOICES = [
20
+ "ays",
21
+ "ays_sd15",
22
+ "ays_sdxl",
23
+ "karras",
24
+ "normal",
25
+ "simple",
26
+ "beta",
27
+ ]
28
+
29
+ SAMPLER_CHOICES = [
30
+ "dpmpp_sde_cfgpp",
31
+ "dpmpp_2m_cfgpp",
32
+ "euler",
33
+ "euler_ancestral",
34
+ "dpmpp_sde",
35
+ "dpmpp_2m",
36
+ "euler_cfgpp",
37
+ "euler_ancestral_cfgpp",
38
+ ]
39
+
40
+
41
+ def _list_model_mapping() -> list[tuple[str, str]]:
42
+ return list_available_models(return_mapping=True)
43
+
44
+
45
+ def _model_choices() -> list[str]:
46
+ return [name for name, _ in _list_model_mapping()]
47
+
48
+
49
+ def _resolve_model_path(display_name: Optional[str]) -> Optional[str]:
50
+ if not display_name:
51
+ return None
52
+
53
+ for name, path in _list_model_mapping():
54
+ if name == display_name:
55
+ return path
56
+ return None
57
+
58
+
59
+ def _load_recent_images(
60
+ prefix: Optional[str] = None,
61
+ started_at: Optional[float] = None,
62
+ limit: int = 12,
63
+ ) -> list[Image.Image]:
64
+ files: list[str] = []
65
+ for ext in ("*.png", "*.jpg", "*.jpeg", "*.webp"):
66
+ files.extend(glob.glob(os.path.join(".", "output", "**", ext), recursive=True))
67
+
68
+ filtered: list[str] = []
69
+ for path in files:
70
+ basename = os.path.basename(path)
71
+ if prefix and prefix not in basename:
72
+ continue
73
+ if started_at is not None:
74
+ try:
75
+ if os.path.getmtime(path) < (started_at - 1.0):
76
+ continue
77
+ except OSError:
78
+ continue
79
+ filtered.append(path)
80
+
81
+ filtered.sort(key=lambda p: os.path.getmtime(p), reverse=True)
82
+
83
+ images: list[Image.Image] = []
84
+ for path in filtered[:limit]:
85
+ try:
86
+ with Image.open(path) as img:
87
+ images.append(img.copy())
88
+ except Exception:
89
+ continue
90
+ return images
91
+
92
+
93
+ def _refresh_history() -> tuple[list[Image.Image], str]:
94
+ images = _load_recent_images(limit=48)
95
+ if not images:
96
+ return [], "No generated images found yet."
97
+ return images, f"Loaded {len(images)} recent images from `output/`."
98
+
99
+
100
+ def _interrupt_generation() -> str:
101
+ app_instance.app.request_interrupt()
102
+ return "Interrupt requested. The current generation will stop at the next safe check."
103
+
104
+
105
+ @spaces.GPU(duration=240)
106
+ def _run_generation(
107
+ prompt: str,
108
+ negative_prompt: str,
109
+ width: int,
110
+ height: int,
111
+ num_images: int,
112
+ batch_size: int,
113
+ scheduler: str,
114
+ sampler: str,
115
+ steps: int,
116
+ guidance_scale: float,
117
+ model_name: Optional[str],
118
+ hires_fix: bool,
119
+ adetailer: bool,
120
+ enhance_prompt: bool,
121
+ img2img_enabled: bool,
122
+ img2img_image: Optional[str],
123
+ img2img_denoise: float,
124
+ stable_fast: bool,
125
+ reuse_seed: bool,
126
+ enable_multiscale: bool,
127
+ multiscale_intermittent: bool,
128
+ multiscale_factor: float,
129
+ multiscale_fullres_start: int,
130
+ multiscale_fullres_end: int,
131
+ keep_models_loaded: bool,
132
+ progress: gr.Progress = gr.Progress(track_tqdm=False),
133
+ ) -> tuple[list[Image.Image], str, dict[str, Any], list[Image.Image]]:
134
+ if not prompt.strip():
135
+ raise gr.Error("Prompt is required.")
136
+
137
+ if img2img_enabled and not img2img_image:
138
+ raise gr.Error("Upload an input image or disable Img2Img.")
139
+
140
+ request_prefix = f"LD-GRADIO-{uuid.uuid4().hex[:8]}"
141
+ started_at = time.time()
142
+
143
+ app = app_instance.app
144
+ app.clear_interrupt()
145
+ app.cleanup_all_previews()
146
+ app.previewer_var.set(True)
147
+
148
+ try:
149
+ try:
150
+ get_model_cache().set_keep_models_loaded(bool(keep_models_loaded))
151
+ except Exception:
152
+ pass
153
+
154
+ model_path = _resolve_model_path(model_name)
155
+
156
+ def _progress_callback(args: dict[str, Any]) -> None:
157
+ step = int(args.get("i", 0))
158
+ total = int(args.get("total_steps", steps))
159
+ if total > 0:
160
+ progress(
161
+ min((step + 1) / total, 1.0),
162
+ desc=f"Sampling step {step + 1}/{total}",
163
+ )
164
+
165
+ progress(0, desc="Preparing generation")
166
+
167
+ result = pipeline(
168
+ prompt=prompt,
169
+ negative_prompt=negative_prompt,
170
+ w=int(width),
171
+ h=int(height),
172
+ number=int(num_images),
173
+ batch=int(batch_size),
174
+ scheduler=scheduler,
175
+ sampler=sampler,
176
+ steps=int(steps),
177
+ cfg_scale=float(guidance_scale),
178
+ hires_fix=bool(hires_fix),
179
+ adetailer=bool(adetailer),
180
+ enhance_prompt=bool(enhance_prompt),
181
+ img2img=bool(img2img_enabled),
182
+ img2img_image=img2img_image if img2img_enabled else None,
183
+ img2img_denoise=float(img2img_denoise),
184
+ stable_fast=bool(stable_fast),
185
+ reuse_seed=bool(reuse_seed),
186
+ autohdr=True,
187
+ realistic_model=False,
188
+ model_path=model_path,
189
+ enable_multiscale=bool(enable_multiscale),
190
+ multiscale_intermittent_fullres=bool(multiscale_intermittent),
191
+ multiscale_factor=float(multiscale_factor),
192
+ multiscale_fullres_start=int(multiscale_fullres_start),
193
+ multiscale_fullres_end=int(multiscale_fullres_end),
194
+ request_filename_prefix=request_prefix,
195
+ callback=_progress_callback,
196
+ )
197
+
198
+ progress(1, desc="Generation complete")
199
+
200
+ final_images = _load_recent_images(
201
+ prefix=request_prefix,
202
+ started_at=started_at,
203
+ limit=max(1, int(num_images)),
204
+ )
205
+ if not final_images and adetailer:
206
+ final_images = _load_recent_images(
207
+ started_at=started_at,
208
+ limit=max(1, int(num_images)),
209
+ )
210
+
211
+ preview_images = list(app.preview_images[:4]) if app.preview_images else []
212
+
213
+ if not final_images:
214
+ raise gr.Error("Generation completed but no output images were found in `output/`.")
215
+
216
+ used_prompt = result.get("used_prompt", prompt) if isinstance(result, dict) else prompt
217
+ metadata = {
218
+ "request_prefix": request_prefix,
219
+ "model_name": model_name or "auto/default",
220
+ "used_prompt": used_prompt,
221
+ "enhancement_applied": bool(result.get("enhancement_applied")) if isinstance(result, dict) else False,
222
+ "img2img_enabled": bool(img2img_enabled),
223
+ "adetailer": bool(adetailer),
224
+ "hires_fix": bool(hires_fix),
225
+ }
226
+ status = f"Generated {len(final_images)} image(s) using `{sampler}` + `{scheduler}`."
227
+ return final_images, status, metadata, preview_images
228
+ finally:
229
+ app.clear_interrupt()
230
+
231
+
232
+ def _build_demo() -> gr.Blocks:
233
+ default_models = _model_choices()
234
+ default_model = default_models[0] if default_models else None
235
+
236
+ with gr.Blocks(title="LightDiffusion-Next ZeroGPU") as demo:
237
+ gr.Markdown(
238
+ """
239
+ # LightDiffusion-Next
240
+ ZeroGPU-compatible Gradio UI. The generation function is wrapped with `@spaces.GPU`
241
+ so Hugging Face can allocate a GPU only while inference is running.
242
+ """
243
+ )
244
+
245
+ with gr.Row():
246
+ with gr.Column(scale=2):
247
+ prompt = gr.Textbox(label="Prompt", lines=5, placeholder="Describe the image you want to generate")
248
+ negative_prompt = gr.Textbox(
249
+ label="Negative Prompt",
250
+ lines=3,
251
+ value="(worst quality, low quality:1.4), (zombie, sketch, interlocked fingers, comic), (embedding:EasyNegative), (embedding:badhandv4)",
252
+ )
253
+
254
+ with gr.Row():
255
+ width = gr.Slider(256, 1536, value=512, step=64, label="Width")
256
+ height = gr.Slider(256, 1536, value=512, step=64, label="Height")
257
+
258
+ with gr.Row():
259
+ num_images = gr.Slider(1, 4, value=1, step=1, label="Images")
260
+ batch_size = gr.Slider(1, 4, value=1, step=1, label="Batch Size")
261
+
262
+ with gr.Row():
263
+ scheduler = gr.Dropdown(SCHEDULER_CHOICES, value="ays", label="Scheduler")
264
+ sampler = gr.Dropdown(SAMPLER_CHOICES, value="dpmpp_sde_cfgpp", label="Sampler")
265
+
266
+ with gr.Row():
267
+ steps = gr.Slider(1, 50, value=20, step=1, label="Steps")
268
+ guidance_scale = gr.Slider(1.0, 15.0, value=7.0, step=0.1, label="CFG")
269
+
270
+ model_name = gr.Dropdown(
271
+ choices=default_models,
272
+ value=default_model,
273
+ allow_custom_value=False,
274
+ label="Model",
275
+ )
276
+
277
+ with gr.Accordion("Advanced", open=False):
278
+ with gr.Row():
279
+ hires_fix = gr.Checkbox(label="HiresFix", value=False)
280
+ adetailer = gr.Checkbox(label="ADetailer", value=False)
281
+ enhance_prompt = gr.Checkbox(label="Enhance Prompt", value=False)
282
+ stable_fast = gr.Checkbox(label="Stable-Fast", value=False)
283
+ with gr.Row():
284
+ reuse_seed = gr.Checkbox(label="Reuse Last Seed", value=False)
285
+ enable_multiscale = gr.Checkbox(label="Multiscale", value=False)
286
+ multiscale_intermittent = gr.Checkbox(label="Intermittent Fullres", value=True)
287
+ keep_models_loaded = gr.Checkbox(label="Keep Models Loaded", value=False)
288
+ with gr.Row():
289
+ multiscale_factor = gr.Slider(0.25, 1.0, value=0.5, step=0.05, label="Multiscale Factor")
290
+ multiscale_fullres_start = gr.Slider(1, 20, value=10, step=1, label="Fullres Start")
291
+ multiscale_fullres_end = gr.Slider(1, 20, value=8, step=1, label="Fullres End")
292
+
293
+ with gr.Accordion("Img2Img", open=False):
294
+ img2img_enabled = gr.Checkbox(label="Enable Img2Img", value=False)
295
+ img2img_image = gr.Image(label="Input Image", type="filepath")
296
+ img2img_denoise = gr.Slider(0.0, 1.0, value=0.75, step=0.01, label="Denoise Strength")
297
+
298
+ with gr.Row():
299
+ generate_button = gr.Button("Generate", variant="primary")
300
+ interrupt_button = gr.Button("Interrupt", variant="stop")
301
+ refresh_models_button = gr.Button("Refresh Models")
302
+
303
+ with gr.Column(scale=3):
304
+ status = gr.Markdown("Ready.")
305
+ gallery = gr.Gallery(label="Generated Images", columns=2, height="auto")
306
+ metadata = gr.JSON(label="Generation Metadata")
307
+ preview_gallery = gr.Gallery(label="Last Preview Frames", columns=4, height="auto")
308
+
309
+ with gr.Tab("History"):
310
+ history_status = gr.Markdown("No generated images loaded yet.")
311
+ history_gallery = gr.Gallery(label="Recent Output Images", columns=4, height="auto")
312
+ refresh_history = gr.Button("Refresh History")
313
+
314
+ refresh_models_button.click(
315
+ fn=lambda: gr.update(
316
+ choices=_model_choices(),
317
+ value=(_model_choices()[0] if _model_choices() else None),
318
+ ),
319
+ outputs=model_name,
320
+ queue=False,
321
+ )
322
+
323
+ interrupt_button.click(_interrupt_generation, outputs=status, queue=False)
324
+ refresh_history.click(_refresh_history, outputs=[history_gallery, history_status], queue=False)
325
+ demo.load(_refresh_history, outputs=[history_gallery, history_status], queue=False)
326
+
327
+ generate_button.click(
328
+ _run_generation,
329
+ inputs=[
330
+ prompt,
331
+ negative_prompt,
332
+ width,
333
+ height,
334
+ num_images,
335
+ batch_size,
336
+ scheduler,
337
+ sampler,
338
+ steps,
339
+ guidance_scale,
340
+ model_name,
341
+ hires_fix,
342
+ adetailer,
343
+ enhance_prompt,
344
+ img2img_enabled,
345
+ img2img_image,
346
+ img2img_denoise,
347
+ stable_fast,
348
+ reuse_seed,
349
+ enable_multiscale,
350
+ multiscale_intermittent,
351
+ multiscale_factor,
352
+ multiscale_fullres_start,
353
+ multiscale_fullres_end,
354
+ keep_models_loaded,
355
+ ],
356
+ outputs=[gallery, status, metadata, preview_gallery],
357
+ )
358
+
359
+ return demo
360
+
361
+
362
+ demo = _build_demo()
363
+ demo.queue(default_concurrency_limit=1)
364
+
365
+
366
+ if __name__ == "__main__":
367
+ demo.launch()
docker-compose.yml ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ services:
2
+ lightdiffusion:
3
+ build:
4
+ context: .
5
+ dockerfile: Dockerfile
6
+ args:
7
+ # Specify target GPU architectures for CUDA extension builds
8
+ # 8.0: A100, 8.6: RTX 30xx, 8.9: RTX 40xx, 9.0: H100, 12.0: RTX 50xx (Blackwell)
9
+ # Customize based on your GPU: TORCH_CUDA_ARCH_LIST: "12.0" for RTX 50xx only
10
+ TORCH_CUDA_ARCH_LIST: "8.0;8.6;8.9;9.0;12.0"
11
+ INSTALL_STABLE_FAST: "0"
12
+ INSTALL_OLLAMA: "0"
13
+ INSTALL_SAGEATTENTION: "0"
14
+ INSTALL_SPARGEATTN: "0"
15
+ ports:
16
+ - "7860:7860" # FastAPI backend serving the built React UI
17
+ volumes:
18
+ # Mount output directory to persist generated images
19
+ - ./output:/app/output
20
+ # Mount checkpoints directory for model files
21
+ - ./include/checkpoints:/app/include/checkpoints
22
+ # Mount other model directories
23
+ - ./include/loras:/app/include/loras
24
+ - ./include/embeddings:/app/include/embeddings
25
+ - ./include/ESRGAN:/app/include/ESRGAN
26
+ - ./include/yolos:/app/include/yolos
27
+ environment:
28
+ - PORT=7860
29
+ - CUDA_VISIBLE_DEVICES=0
30
+ - CUDA_HOME=/usr/local/cuda
31
+ - PROMPT_ENHANCER_MODEL=qwen3:0.6b
32
+ deploy:
33
+ resources:
34
+ reservations:
35
+ devices:
36
+ - driver: nvidia
37
+ count: 1
38
+ capabilities: [ gpu ]
39
+ restart: unless-stopped
40
+ stdin_open: true
41
+ tty: true
docker/README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Docker Build Scripts
2
+
3
+ This directory contains helper scripts used during the Docker image build process.
4
+
5
+ ## Files
6
+
7
+ ### patch_sageattention.py
8
+ **Purpose**: Patches the SageAttention setup.py to support building without GPU present.
9
+
10
+ **What it does**:
11
+ - Adds support for the `TORCH_CUDA_ARCH_LIST` environment variable to SageAttention
12
+ - Allows specifying target GPU architectures via environment variable
13
+ - Enables building Docker images on machines without NVIDIA GPUs
14
+
15
+ **Usage** (automatically called during Docker build):
16
+ ```bash
17
+ cd SageAttention
18
+ python3 ../docker/patch_sageattention.py
19
+ ```
20
+
21
+ **Why it's needed**:
22
+ SageAttention's original setup.py tries to detect GPU hardware during build time using `torch.cuda.device_count()`. This fails in Docker builds because:
23
+ 1. Docker builds don't have GPU access by default (even with `--gpus all`)
24
+ 2. GPU access during build is not guaranteed across all Docker configurations
25
+ 3. Build machines may not have the same GPU as the target runtime machine
26
+
27
+ The patch adds a check for `TORCH_CUDA_ARCH_LIST` environment variable before attempting hardware detection, allowing explicit specification of target architectures.
28
+
29
+ ### sageattention_setup.patch (not used)
30
+ Legacy patch file - kept for reference. The Python script approach is preferred.
31
+
32
+ ## How the Build Process Works
33
+
34
+ 1. **Environment Setup**: `TORCH_CUDA_ARCH_LIST` is set in Dockerfile via ARG/ENV
35
+ 2. **Patch Application**: `patch_sageattention.py` modifies SageAttention's setup.py
36
+ 3. **Extension Build**: Modified setup.py reads `TORCH_CUDA_ARCH_LIST` and compiles for specified architectures
37
+ 4. **SpargeAttn Build**: Already supports `TORCH_CUDA_ARCH_LIST` natively, no patch needed
38
+
39
+ ## Maintenance
40
+
41
+ If SageAttention is updated, one may need to:
42
+ 1. Check if the patch still applies correctly
43
+ 2. Update the target line in `patch_sageattention.py` if the setup.py structure changes
44
+ 3. Test the build process with the new version
45
+
46
+ The patch is designed to be non-intrusive and should work across most SageAttention versions that follow the same setup.py structure.
docker/patch_sageattention.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Patch for SageAttention setup.py to support TORCH_CUDA_ARCH_LIST environment variable.
4
+ This allows building without GPUs present during build time.
5
+ """
6
+
7
+ import sys
8
+
9
+ setup_py_path = "setup.py"
10
+
11
+ # Read the original setup.py
12
+ with open(setup_py_path, 'r') as f:
13
+ content = f.read()
14
+
15
+ # Find the line where compute_capabilities is initialized
16
+ target_line = "compute_capabilities = set()"
17
+
18
+ if target_line not in content:
19
+ print("ERROR: Could not find target line in setup.py")
20
+ sys.exit(1)
21
+
22
+ # Add our patch right after compute_capabilities initialization
23
+ patch_code = '''
24
+ # Check for TORCH_CUDA_ARCH_LIST environment variable first (Docker build support)
25
+ env_arch_list = os.environ.get("TORCH_CUDA_ARCH_LIST", None)
26
+ if env_arch_list:
27
+ print(f"Using TORCH_CUDA_ARCH_LIST from environment: {env_arch_list}")
28
+ arch_list = env_arch_list.replace(" ", ";").split(";")
29
+ for arch in arch_list:
30
+ arch = arch.strip()
31
+ if not arch:
32
+ continue
33
+ if arch.endswith("+PTX"):
34
+ arch = arch[:-4].strip()
35
+ if arch:
36
+ compute_capabilities.add(arch)
37
+ '''
38
+
39
+ # Insert the patch
40
+ content = content.replace(
41
+ target_line,
42
+ target_line + patch_code
43
+ )
44
+
45
+ # Write back
46
+ with open(setup_py_path, 'w') as f:
47
+ f.write(content)
48
+
49
+ print("✓ Successfully patched setup.py to support TORCH_CUDA_ARCH_LIST")
docker/sageattention_setup.patch ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ --- setup.py.orig 2024-10-02 00:00:00.000000000 +0000
2
+ +++ setup.py 2024-10-02 00:00:00.000000000 +0000
3
+ @@ -66,6 +66,17 @@
4
+ nvcc_cuda_version = parse(output[release_idx].split(",")[0])
5
+ return nvcc_cuda_version
6
+
7
+ +# Check for TORCH_CUDA_ARCH_LIST environment variable first
8
+ +import os
9
+ +env_arch_list = os.environ.get("TORCH_CUDA_ARCH_LIST", None)
10
+ +if env_arch_list:
11
+ + print(f"Using TORCH_CUDA_ARCH_LIST from environment: {env_arch_list}")
12
+ + arch_list = env_arch_list.replace(" ", ";").split(";")
13
+ + for arch in arch_list:
14
+ + arch = arch.strip()
15
+ + if not arch:
16
+ + continue
17
+ + if arch.endswith("+PTX"):
18
+ + arch = arch[:-4].strip()
19
+ + if arch:
20
+ + compute_capabilities.add(arch)
21
+ +
22
+ # Iterate over all GPUs on the current machine. Also you can modify this part to specify the architecture if you want to build for specific GPU architectures.
23
+ compute_capabilities = set()
24
+ device_count = torch.cuda.device_count()
docs/advanced-cfg-optimizations.md ADDED
@@ -0,0 +1,262 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Advanced CFG Optimizations
2
+
3
+ ## Overview
4
+
5
+ This document describes three advanced optimizations for Classifier-Free Guidance (CFG) that improve both quality and performance in LightDiffusion-Next:
6
+
7
+ 1. **Batched CFG Computation** - Speed optimization
8
+ 2. **Dynamic CFG Rescaling** - Quality optimization
9
+ 3. **Adaptive Noise Scheduling** - Quality & speed optimization
10
+
11
+ ## 1. Batched CFG Computation
12
+
13
+ ### What It Does
14
+
15
+ Instead of running two separate forward passes for conditional and unconditional predictions, this optimization can combine them into a single batched forward pass.
16
+
17
+ **Before:**
18
+ ```python
19
+ # Two separate forward passes
20
+ cond_pred = model(x, timestep, cond) # Pass 1
21
+ uncond_pred = model(x, timestep, uncond) # Pass 2
22
+ result = uncond_pred + cfg_scale * (cond_pred - uncond_pred)
23
+ ```
24
+
25
+ **After:**
26
+ ```python
27
+ # Single batched forward pass
28
+ both_preds = model(x, timestep, [cond, uncond]) # Single pass
29
+ cond_pred, uncond_pred = both_preds[0], both_preds[1]
30
+ result = uncond_pred + cfg_scale * (cond_pred - uncond_pred)
31
+ ```
32
+
33
+ ### Performance Impact
34
+
35
+ - **Speed**: ~1.8-2x faster CFG computation
36
+ - **Memory**: Same or slightly less (batch processing)
37
+ - **Quality**: Identical to baseline
38
+
39
+ ### Usage
40
+
41
+ ```python
42
+ from src.sample import sampling
43
+
44
+ samples = sampling.sample1(
45
+ model=model,
46
+ noise=noise,
47
+ steps=20,
48
+ cfg=7.5,
49
+ # ... other params ...
50
+ batched_cfg=True, # Joint cond/uncond batching (default: True)
51
+ )
52
+ ```
53
+
54
+ In the current implementation, the heavy lifting still happens in the central conditioning packing path. `batched_cfg` controls whether conditional and unconditional branches are packed together into the same forward pass when possible. Conditioning chunks within each branch are still packed by the shared batching logic.
55
+
56
+ ### When to Use
57
+
58
+ - **Usually recommended** - This reduces duplicate cond/uncond forward passes when memory allows
59
+ - Particularly beneficial for high-resolution images or batch generation
60
+ - Compatible with all samplers and schedulers
61
+
62
+ ---
63
+
64
+ ## 2. Dynamic CFG Rescaling
65
+
66
+ ### What It Does
67
+
68
+ Dynamically adjusts the CFG scale based on prediction statistics to prevent over-saturation while maintaining prompt adherence.
69
+
70
+ ### The Problem
71
+
72
+ High CFG values (7-12) improve prompt following but can cause:
73
+ - Over-saturated colors
74
+ - Over-sharpened edges ("halo effect")
75
+ - Loss of fine details
76
+ - Unnatural, "CG-like" appearance
77
+
78
+ ### The Solution
79
+
80
+ Dynamic CFG rescaling analyzes the guidance vector (difference between conditional and unconditional predictions) and adjusts the CFG scale to keep it within an optimal range.
81
+
82
+ **Two Methods:**
83
+
84
+ #### Variance Method (Recommended)
85
+ ```python
86
+ guidance_std = std(cond_pred - uncond_pred)
87
+ adjusted_cfg = cfg_scale * (target_scale / (1 + guidance_std))
88
+ ```
89
+
90
+ Best for: General use, prevents over-saturation
91
+
92
+ #### Range Method
93
+ ```python
94
+ guidance_range = percentile(guidance, 95) - percentile(guidance, 5)
95
+ adjusted_cfg = cfg_scale * (target_scale / guidance_range)
96
+ ```
97
+
98
+ Best for: Extreme cases, outlier filtering
99
+
100
+ ### Performance Impact
101
+
102
+ - **Speed**: Minimal overhead (~2-5%)
103
+ - **Quality**: Improved color balance, reduced artifacts
104
+ - **Prompt Adherence**: Maintained or improved
105
+
106
+ ### Usage
107
+
108
+ ```python
109
+ samples = sampling.sample1(
110
+ model=model,
111
+ # ... other params ...
112
+ dynamic_cfg_rescaling=True, # Enable dynamic rescaling
113
+ dynamic_cfg_method="variance", # Method: "variance" or "range"
114
+ dynamic_cfg_percentile=95, # Percentile for range method
115
+ dynamic_cfg_target_scale=1.0, # Target normalization scale
116
+ )
117
+ ```
118
+
119
+ ### When to Use
120
+
121
+ - High CFG values (>7.5)
122
+ - Detailed prompts that might cause over-saturation
123
+ - Photorealistic generations
124
+ - Portraits and faces
125
+
126
+ ### When to Avoid
127
+
128
+ - Very low CFG (<3.0) - minimal benefit
129
+ - Artistic/stylized generations where saturation is desired
130
+ - When using CFG-free sampling (already handles this differently)
131
+
132
+ ---
133
+
134
+ ## 3. Adaptive Noise Scheduling
135
+
136
+ ### What It Does
137
+
138
+ Dynamically adjusts the noise schedule based on content complexity during generation.
139
+
140
+ ### The Problem
141
+
142
+ Traditional fixed noise schedules apply the same denoising steps to all regions:
143
+ - Complex scenes (detailed textures) may need more steps in certain regions
144
+ - Simple scenes (smooth gradients) can use fewer steps
145
+ - This wastes computation or undersamples complexity
146
+
147
+ ### The Solution
148
+
149
+ Analyzes the complexity of intermediate predictions and adjusts subsequent noise levels accordingly.
150
+
151
+ **Two Methods:**
152
+
153
+ #### Complexity Method (Recommended)
154
+ ```python
155
+ complexity = variance(denoised, spatial_dims)
156
+ # High variance = complex details = maintain fine noise steps
157
+ # Low variance = simple areas = can skip intermediate steps
158
+ ```
159
+
160
+ Best for: General content-aware optimization
161
+
162
+ #### Attention Method
163
+ ```python
164
+ complexity = mean(|gradient(denoised)|)
165
+ # High gradients = edges/details = need more precision
166
+ # Low gradients = smooth areas = can denoise faster
167
+ ```
168
+
169
+ Best for: Edge-focused content (architecture, technical drawings)
170
+
171
+ ### Performance Impact
172
+
173
+ - **Speed**: 10-20% faster for simple scenes, same for complex
174
+ - **Quality**: Adaptive - maintains quality where needed
175
+ - **Prompt Adherence**: Unchanged
176
+
177
+ ### Usage
178
+
179
+ ```python
180
+ samples = sampling.sample1(
181
+ model=model,
182
+ # ... other params ...
183
+ adaptive_noise_enabled=True, # Enable adaptive scheduling
184
+ adaptive_noise_method="complexity", # Method: "complexity" or "attention"
185
+ )
186
+ ```
187
+
188
+ ### When to Use
189
+
190
+ - Mixed complexity scenes (e.g., detailed subject + simple background)
191
+ - Long sampling runs (50+ steps) - more opportunity to optimize
192
+ - Batch generation with varying prompt complexity
193
+
194
+ ### When to Avoid
195
+
196
+ - Very short sampling runs (<10 steps) - overhead > benefit
197
+ - Uniformly complex scenes - no simplification possible
198
+ - When exact step-by-step reproducibility is critical
199
+
200
+ ---
201
+
202
+ ## Combining Optimizations
203
+
204
+ All three optimizations can be used together:
205
+
206
+ ```python
207
+ samples = sampling.sample1(
208
+ model=model,
209
+ noise=noise,
210
+ steps=20,
211
+ cfg=7.5,
212
+ sampler_name="dpmpp_sde_cfgpp",
213
+ scheduler="ays",
214
+ positive=positive_cond,
215
+ negative=negative_cond,
216
+ latent_image=latent,
217
+ # All optimizations enabled
218
+ batched_cfg=True,
219
+ dynamic_cfg_rescaling=True,
220
+ dynamic_cfg_method="variance",
221
+ dynamic_cfg_target_scale=1.0,
222
+ adaptive_noise_enabled=True,
223
+ adaptive_noise_method="complexity",
224
+ )
225
+ ```
226
+
227
+ **Expected Results:**
228
+ - Better color balance and detail preservation
229
+ - Reduced over-saturation artifacts
230
+ - Maintained or improved prompt adherence
231
+
232
+ ## Troubleshooting
233
+
234
+ ### Batched CFG Issues
235
+
236
+ **Problem**: Memory errors with batched CFG
237
+ **Solution**: System may not have enough VRAM for joint cond/uncond batching. Disable it with `batched_cfg=False`, which keeps the conditioning path active but runs the two branches separately.
238
+
239
+ ### Dynamic CFG Issues
240
+
241
+ **Problem**: Images too flat/desaturated
242
+ **Solution**: Increase `dynamic_cfg_target_scale` (try 1.5 or 2.0)
243
+
244
+ **Problem**: Still over-saturated
245
+ **Solution**: Switch to `dynamic_cfg_method="range"` and lower `dynamic_cfg_percentile`
246
+
247
+ ### Adaptive Noise Issues
248
+
249
+ **Problem**: Inconsistent results
250
+ **Solution**: Adaptive scheduling makes slight changes based on content. Disable for exact reproducibility.
251
+
252
+ **Problem**: No speed improvement
253
+ **Solution**: Works best with simple scenes. Complex scenes won't see speedup (but won't be slower either).
254
+
255
+ ---
256
+
257
+ ## Credits
258
+
259
+ Implemented for LightDiffusion-Next by combining insights from:
260
+ - CFG++ dynamic rescaling techniques
261
+ - ComfyUI batched computation patterns
262
+ - Stable Diffusion WebUI adaptive scheduling
docs/api.md ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # REST API & Automation (Quick Reference)
2
+
3
+ LightDiffusion-Next ships with a FastAPI service (`server.py`) that sits in front of the shared pipeline. It batches compatible requests, streams telemetry and exposes health probes so you can plug the system into automation workflows, bots or orchestrators.
4
+
5
+ ## Common endpoints
6
+
7
+ | Method | Path | Description |
8
+ | --- | --- | --- |
9
+ | `GET` | `/health` | Lightweight readiness probe. Returns `{ "status": "ok" }` when the server is reachable. |
10
+ | `GET` | `/api/telemetry` | Queue and VRAM telemetry: batching stats, pending requests, cache state, uptime. |
11
+ | `POST` | `/api/generate` | Submit a generation job. Requests are buffered, batched when signatures match and resolved asynchronously. |
12
+
13
+ The service listens on port `7861` by default. Launch it with:
14
+
15
+ ```fish
16
+ uvicorn server:app --host 0.0.0.0 --port 7861
17
+ ```
18
+
19
+ ## Payload schema (`/api/generate`)
20
+
21
+ ```json
22
+ {
23
+ "prompt": "string",
24
+ "negative_prompt": "string",
25
+ "width": 512,
26
+ "height": 512,
27
+ "num_images": 1,
28
+ "batch_size": 1,
29
+ "scheduler": "ays",
30
+ "sampler": "dpmpp_sde_cfgpp",
31
+ "steps": 20,
32
+ "hires_fix": false,
33
+ "adetailer": false,
34
+ "enhance_prompt": false,
35
+ "img2img_enabled": false,
36
+ "img2img_image": null,
37
+ "stable_fast": false,
38
+ "reuse_seed": false,
39
+ "flux_enabled": false,
40
+ "realistic_model": false,
41
+ "multiscale_enabled": true,
42
+ "multiscale_intermittent": true,
43
+ "multiscale_factor": 0.5,
44
+ "multiscale_fullres_start": 10,
45
+ "multiscale_fullres_end": 8,
46
+ "keep_models_loaded": true,
47
+ "enable_preview": false,
48
+ "preview_fidelity": "balanced",
49
+ "guidance_scale": null,
50
+ "seed": null
51
+ }
52
+ ```
53
+
54
+ Not all fields are required—only `prompt`, `width`, `height` and `num_images` are strictly necessary. Any unknown keys are ignored, making the endpoint forward-compatible with UI features.
55
+
56
+ ### Response format
57
+
58
+ Successful requests return either:
59
+
60
+ ```json
61
+ { "image": "<base64-png>" }
62
+ ```
63
+
64
+ or, if multiple images were requested:
65
+
66
+ ```json
67
+ { "images": ["<base64-png>", "<base64-png>"] }
68
+ ```
69
+
70
+ Base64 strings represent PNG files with embedded metadata identical to the Streamlit UI output. Decode and write them to disk.
71
+
72
+ ### Img2Img uploads
73
+
74
+ When `img2img_enabled` is `true`, `img2img_image` may be provided as any of the following:
75
+
76
+ - A local file path (e.g., `"tests/test.png"`)
77
+ - A data URL (e.g., `"data:image/png;base64,<...>"`)
78
+ - A raw Base64-encoded PNG string
79
+
80
+ The server will decode data URLs and raw Base64 strings and save them to the system temporary directory before processing (default max upload size: 10 MB). Keep payloads under a few megabytes to avoid HTTP timeouts.
81
+
82
+ ## Telemetry shape (`/api/telemetry`)
83
+
84
+ The telemetry endpoint returns operational stats that help with autoscaling or queue dashboards. Example snippet:
85
+
86
+ ```json
87
+ {
88
+ "uptime_seconds": 1234.56,
89
+ "pending_count": 2,
90
+ "pending_by_signature": {
91
+ "(False, 512, 512, True, False, False, True, True, 0.5, 10, 8, False, True, False)": 2
92
+ },
93
+ "pending_preview": [
94
+ {"request_id": "a1b2c3d4", "waiting_s": 0.42, "prompt_preview": "a cinematic robot..."}
95
+ ],
96
+ "max_batch_size": 4,
97
+ "max_images_per_group": 256,
98
+ "batch_timeout": 0.5,
99
+ "batches_processed": 12,
100
+ "items_processed": 24,
101
+ "requests_processed": 12,
102
+ "avg_processed_wait_s": 0.31,
103
+ "pending_avg_wait_s": 0.12,
104
+ "memory_info": {
105
+ "vram_allocated_mb": 5623,
106
+ "vram_reserved_mb": 6144,
107
+ "system_ram_mb": 12345
108
+ },
109
+ "loaded_models_count": 2,
110
+ "loaded_models": ["SD15 UNet", "SD15 VAE"],
111
+ "pipeline_import_ok": true,
112
+ "pipeline_import_error": null
113
+ }
114
+ ```
115
+
116
+ Use this data to spot batching mismatches (different signatures cannot be coalesced), monitor VRAM usage or expose metrics to Prometheus/Grafana.
117
+
118
+ ## Queue tuning knobs
119
+
120
+ The queue accepts a few environment variables that influence behaviour:
121
+
122
+ | Variable | Default | Effect |
123
+ | --- | --- | --- |
124
+ | `LD_MAX_BATCH_SIZE` | `4` | Maximum items processed together when signatures match. |
125
+ | `LD_BATCH_TIMEOUT` | `0.5` | Seconds to wait before flushing a batch. |
126
+ | `LD_BATCH_WAIT_SINGLETONS` | `0` | If `1`, single jobs wait the timeout hoping for companions. Set to `0` to process singletons immediately. |
127
+ | `LD_MAX_IMAGES_PER_GROUP` | `256` | Maximum combined images processed in a single pipeline run when coalescing multiple requests. Groups larger than this are processed sequentially in smaller chunks to avoid memory and disk pressure. |
128
+ | `LD_MAX_IMAGES_PER_SAVE` | `16` | Maximum images allowed in a single `save_images` call. If exceeded, the save is aborted to avoid creating many tile files; change with `LD_MAX_IMAGES_PER_SAVE` if needed. |
129
+ | `LD_SERVER_LOGLEVEL` | `DEBUG` | Logging verbosity for `logs/server.log`. |
130
+
131
+ ## Deploying behind a reverse proxy
132
+
133
+ When hosting remotely:
134
+
135
+ - Front the FastAPI app with Nginx/Caddy and increase client body size if you accept Img2Img uploads.
136
+ - Expose `/health` for liveness checks and `/api/telemetry` for readiness/autoscaling gates.
137
+ - Mount `./include`, `./output` and `~/.cache/torch_extensions` as volumes so workers share models, outputs and compiled kernels.
138
+
139
+ ## Testing the service quickly
140
+
141
+ ```fish
142
+ # Send a simple generation job
143
+ curl -X POST http://localhost:7861/api/generate \
144
+ -H "Content-Type: application/json" \
145
+ -d '{"prompt": "painted nebula over distant mountains", "width": 512, "height": 512, "num_images": 1}' \
146
+ | jq -r '.image' | base64 -d > nebula.png
147
+
148
+ # Inspect queue state
149
+ curl http://localhost:7861/api/telemetry | jq
150
+ ```
151
+
152
+ That’s it! Check the [Troubleshooting guide](quirks.md) if the service reports missing models or the queue appears stalled.
docs/architecture.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Architecture
2
+
3
+ LightDiffusion-Next is split into three cooperating layers: UX surfaces, a FastAPI gateway and a modular inference core. Requests move through these layers, picking up metadata and transformations before image tensors ever touch the GPU. This page decomposes the system and highlights the extension points you are most likely to touch.
4
+
5
+ ## Layers in detail
6
+
7
+ ### UX layer (`streamlit_app.py`, `app.py`, `ui/*`)
8
+
9
+ - Streamlit exposes rich controls, preset management and history in `ui/settings.py` and `ui/history.py`.
10
+ - Gradio powers Spaces deployments (`app.py`). It streams previews via generators and mirrors the Streamlit control surface.
11
+ - Both UIs instantiate a shared `AppInstance` which holds the pipeline, preview queues and cached settings.
12
+
13
+ ### FastAPI gateway (`server.py`)
14
+
15
+ - Implements `/api/generate`, `/api/telemetry`, `/api/interrogate` and health probes.
16
+ - `GenerationBuffer` batches jobs with compatible shapes, models and LoRA overlays to maximize GPU utilization.
17
+ - Telemetry exposes queue lengths, average latency, VRAM usage and cached model fingerprints.
18
+ - Server-side logging includes per-request identifiers and request tracebacks in `logs/server.log`.
19
+
20
+ ### Pipeline core (`src/user/pipeline.py`)
21
+
22
+ This module orchestrates conditioning, diffusion, optional refinements and output serialization.
23
+
24
+ - **Model resolution** — `src/FileManaging/Loader` locates checkpoints, VAE, CLIP weights and LoRAs. Stable-Fast backends live in `src/StableFast` and can be toggled in settings.
25
+ - **Conditioning** — Prompts are tokenized through `src/cond/cond.py`. Negative prompts, style presets and textual inversion embeddings are applied here.
26
+ - **Sampling** — `src/sample/sampling.KSampler` coordinates samplers (`ddim`, `dpmpp`, `k-diffusion`, etc.) with CFG++ and Flux schedulers.
27
+ - **Enhancements** — Multi-scale diffusion (`multiscale_presets.py`), AutoDetailer (YOLO detection + inpainting), UltimateSDUpscale and AutoHDR run after the base diffusion loop.
28
+ - **Outputs** — `src/FileManaging/ImageSaver` writes PNGs, JSON metadata and optionally sends frames to the preview queues.
29
+
30
+ ### Device and cache (`src/Device/ModelCache.py`)
31
+
32
+ - Maintains reference-counted handles for UNet, VAE, CLIP and Flux components.
33
+ - Handles VRAM telemetry and eviction policies so the UI can show “keep loaded” toggles without manual restarts.
34
+ - Tracks whether Stable-Fast kernels, SageAttention or SD1.5 attention patches are initialized.
35
+
36
+ ### Asset management (`src/FileManaging/Downloader.py`)
37
+
38
+ - Validates required checkpoints, VAE files, LoRAs, embeddings, YOLO detectors and Flux components at startup.
39
+ - Supports mirrored download hosts and resumable transfers for large files.
40
+ - Exposes helper methods used by the UI to fetch missing assets on demand.
41
+
42
+ ### Preview subsystem (`src/user/app_instance.py`)
43
+
44
+ - Provides `get_latest_previews()` for UI clients, backed by a dedicated thread that consumes preview tensors straight from the pipeline.
45
+ - Supports interrupt handling by setting `app_instance.interrupt = True`, which causes the sampler to exit gracefully.
46
+
47
+ ## Request lifecycle
48
+
49
+ 1. **Submission** — A UI or REST client creates a job payload containing prompts, dimensions, sampler settings, seed and post-processing flags.
50
+ 2. **Queueing & batching** — Jobs are inserted into `GenerationBuffer`. Depending on `LD_BATCH_WAIT_SINGLETONS`, single jobs may wait briefly for compatible companions to maximize GPU throughput.
51
+ 3. **Model preparation** — The pipeline loads or reuses cached models, applies LoRA deltas, textual inversion embeddings and optional quantization adapters (via `src/Quantize`).
52
+ 4. **Diffusion** — The sampler executes the denoising loop. Flux mode uses `src/BlackForest/Flux.py` for decoder steps; Stable-Fast kernels speed up SD1.5/SDXL.
53
+ 5. **Refinement** — Optional stages (HiRes Fix, AutoDetailer, AutoHDR, UltimateSDUpscale) run sequentially per sample.
54
+ 6. **Persistence** — Final images and metadata are written to `output/<workflow>/`. Streamlit previews receive running frames; REST clients receive base64 PNG payloads plus telemetry.
55
+
56
+ ## Filesystem overview
57
+
58
+ - `include/checkpoints` — SD checkpoints (1.5, SDXL, Flux, etc.).
59
+ - `include/loras`, `include/embeddings` — LoRA adapters and textual inversion concepts.
60
+ - `include/clip` — Tokenizer and encoder configs.
61
+ - `include/yolos` — Object detectors for AutoDetailer.
62
+ - `include/ESRGAN` — Upscaler models for UltimateSDUpscale.
63
+ - `output/*` — Organized galleries (Classic, Flux, Img2Img, Upscale, etc.).
64
+ - `webui_settings.json` — Persisted Streamlit configuration.
65
+
66
+ ## Extending LightDiffusion-Next
67
+
68
+ - **New samplers** — Implement in `src/sample/samplers.py` and register with `KSampler`. Add UI and REST switches via `ui/settings.py` and `GenerateRequest`.
69
+ - **Additional post-processing** — Follow the pattern in `UltimateSDUpscale` or `AutoHDR` and register the stage near the end of `pipeline()`.
70
+ - **Custom model managers** — Plug alternative download logic into `FileManaging/Downloader` or mount volumes in Docker deployments.
71
+ - **Observability** — Add metrics/log statements in `GenerationBuffer` or extend `/api/telemetry` to fit orchestrator dashboards.
72
+
73
+ Armed with this bird’s-eye view, you can dive into the [usage guide](usage.md) for operator workflows or the upcoming [API reference](api.md) for automation hooks.
docs/ays-scheduler.md ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## 2. AYS (Align Your Steps) Scheduler
2
+
3
+ ### What It Does
4
+
5
+ Uses optimized timestep distributions that allow **fewer sampling steps** with **same or better quality** compared to uniform schedulers.
6
+
7
+ ### Key Insight
8
+
9
+ Not all timesteps contribute equally to image formation. AYS pre-computes optimal sigma schedules that focus more steps on critical noise levels.
10
+
11
+ ### Research Background
12
+
13
+ Based on "Align Your Steps: Optimizing Sampling Schedules in Diffusion Models" (2024)
14
+ - https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/
15
+ - Developed by NVIDIA researchers
16
+ - Validated across SD1.5, SDXL, and other models
17
+
18
+ ### Performance
19
+
20
+ | Model | Normal Scheduler | AYS Scheduler | Quality |
21
+ |-------|-----------------|---------------|---------|
22
+ | SD1.5 | 20 steps | **10 steps** | Same/Better |
23
+ | SDXL | 20 steps | **10 steps** | Same/Better |
24
+ | Flux | 15 steps | **8 steps** | Same |
25
+
26
+ ### Usage
27
+
28
+ #### Via UI (Streamlit)
29
+
30
+ 1. Open Settings → Sampling
31
+ 2. Select scheduler: "AYS (Align Your Steps)"
32
+ 3. Reduce steps to 10 (SD1.5/SDXL) or 8 (Flux)
33
+ 4. Generate - same quality, 2x faster!
34
+
35
+ #### Programmatically
36
+
37
+ ```python
38
+ from src.sample import ksampler_util
39
+
40
+ # Using AYS scheduler
41
+ sigmas = ksampler_util.calculate_sigmas(
42
+ model_sampling,
43
+ scheduler_name="ays", # or "ays_sd15", "ays_sdxl", "ays_flux"
44
+ steps=10
45
+ )
46
+ ```
47
+
48
+ ### Scheduler Variants
49
+
50
+ - `"ays"` or `"ays_sd15"` - SD1.5 optimized (default)
51
+ - `"ays_sdxl"` - SDXL optimized
52
+ - `"ays_flux"` - Flux optimized (experimental)
53
+
54
+ ### Optimal Step Counts
55
+
56
+ Pre-computed optimal schedules exist for:
57
+
58
+ **SD1.5**: 4, 6, 8, 10, 12, 15, 20, 25 steps
59
+ **SDXL**: 4, 6, 8, 10, 12, 15, 20 steps
60
+ **Flux**: 4, 8, 10, 15, 20 steps
61
+
62
+ Other step counts use interpolation (slightly less optimal but still better than uniform).
63
+
64
+ ### Recommended Settings
65
+
66
+ #### SD1.5 Quick Generation
67
+ ```yaml
68
+ scheduler: "ays"
69
+ steps: 10 # instead of 20
70
+ sampler: "euler" or "dpmpp_2m_cfgpp"
71
+ cfg: 7.0
72
+ ```
73
+
74
+ #### SDXL High Quality
75
+ ```yaml
76
+ scheduler: "ays_sdxl"
77
+ steps: 12 # instead of 20-25
78
+ sampler: "dpmpp_2m_cfgpp"
79
+ cfg: 6.0
80
+ ```
81
+
82
+ #### Flux Fast Mode
83
+ ```yaml
84
+ scheduler: "ays_flux"
85
+ steps: 8 # instead of 15
86
+ sampler: "euler"
87
+ cfg: 3.5
88
+ ```
89
+
90
+ ### Comparison: Uniform vs AYS
91
+
92
+ **Uniform Distribution (normal scheduler)**:
93
+ ```
94
+ Steps: 0 4 8 12 16 20
95
+ Sigmas evenly spaced → wastes compute on low-impact timesteps
96
+ ```
97
+
98
+ **AYS Distribution**:
99
+ ```
100
+ Steps: 0 2 5 8 12 17 20
101
+ Sigmas concentrated on critical noise levels → better efficiency
102
+ ```
103
+
104
+ ### Technical Details
105
+
106
+ AYS schedules are pre-computed using optimization to minimize reconstruction error:
107
+
108
+ ```python
109
+ # Example SD1.5 10-step schedule
110
+ AYS_SD15_10 = [
111
+ 14.6146, # High noise (early steps - image structure)
112
+ 10.4708,
113
+ 7.3688,
114
+ 4.9651, # Mid noise (detail formation)
115
+ 3.2924,
116
+ 2.1391,
117
+ 1.3633, # Low noise (fine details)
118
+ 0.8437,
119
+ 0.4898,
120
+ 0.2279,
121
+ 0.0 # Final step
122
+ ]
123
+ ```
124
+
125
+ Compare to uniform schedule:
126
+ ```python
127
+ # Normal scheduler @ 10 steps
128
+ NORMAL_10 = [14.6146, 11.3, 8.7, 6.7, 5.1, 3.9, 3.0, 2.3, 1.7, 1.2, 0.0]
129
+ # More evenly spaced → less efficient
130
+ ```
131
+
132
+ ### Troubleshooting
133
+
134
+ **Q: Images look different with AYS?**
135
+ A: Yes, they will differ slightly (different paths through noise space). Quality should be same or better. Adjust CFG if needed.
136
+
137
+ **Q: AYS + multiscale?**
138
+ A: Works great together! AYS optimizes step distribution, multiscale optimizes spatial resolution.
139
+
140
+ **Q: Can I use AYS with euler_ancestral?**
141
+ A: Yes! Works with all samplers (euler, euler_ancestral, dpmpp_2m_cfgpp, dpmpp_sde_cfgpp, etc.)
142
+
143
+ **Q: How to verify it's active?**
144
+ A: Check logs for "Using AYS optimal schedule" message.
145
+
146
+ ### References
147
+
148
+ - Original paper: https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/
149
+ - Implementation: `src/sample/ays_scheduler.py`
150
+ - Integration: `src/sample/ksampler_util.py`
docs/cfg-free-sampling.md ADDED
@@ -0,0 +1,269 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CFG-Free Sampling
2
+
3
+ ## Overview
4
+
5
+ CFG-Free Sampling is a **quality optimization technique** that gradually reduces Classifier-Free Guidance (CFG) to zero during the final stages of image generation. This approach leverages the observation that high CFG strength is most beneficial early in the denoising process, while later steps benefit from reduced guidance for more natural, detailed outputs.
6
+
7
+ By intelligently transitioning from high-guidance to low-guidance sampling, CFG-Free achieves:
8
+
9
+ - **Improved fine detail** and texture quality
10
+ - **More natural color saturation** and tonal balance
11
+ - **Reduced artifacts** from over-guidance (halos, oversaturation, unnatural sharpness)
12
+ - **Better prompt adherence** while maintaining photorealism
13
+
14
+ This is a **training-free** technique that works with any sampler and can be combined with other optimizations.
15
+
16
+ ## How It Works
17
+
18
+ ### The CFG Problem
19
+
20
+ Classifier-Free Guidance strengthens prompt adherence by amplifying the difference between conditional and unconditional predictions:
21
+
22
+ $$
23
+ \text{output} = \text{uncond\_pred} + \text{cfg\_scale} \times (\text{cond\_pred} - \text{uncond\_pred})
24
+ $$
25
+
26
+ **Benefits of high CFG (7-12):**
27
+ - Strong prompt following
28
+ - Clear compositional structure
29
+ - Distinct subjects and backgrounds
30
+
31
+ **Drawbacks of high CFG throughout generation:**
32
+ - Over-sharpened edges ("halo effect")
33
+ - Oversaturated colors
34
+ - Loss of fine detail and texture
35
+ - Unnatural, "CG-like" appearance
36
+ - Potential anatomical distortions
37
+
38
+ ### The CFG-Free Solution
39
+
40
+ Research shows that CFG importance varies by denoising stage:
41
+
42
+ ```
43
+ ┌─────────────────────────────────────────────────────────┐
44
+ │ Early Steps (0-70%) │
45
+ │ High CFG is crucial: │
46
+ │ • Establishes composition │
47
+ │ • Defines subject placement │
48
+ │ • Interprets prompt semantics │
49
+ │ │
50
+ │ CFG = 7.0 (user-configured) │
51
+ └─────────────────────────────────────────────────────────┘
52
+
53
+ ┌─────────────────────────────────────────────────────────┐
54
+ │ Late Steps (70-100%) │
55
+ │ High CFG becomes detrimental: │
56
+ │ • Composition already locked in │
57
+ │ • Fine details being refined │
58
+ │ • Oversaturation and artifacts emerge │
59
+ │ │
60
+ │ CFG = 7.0 → 0.0 (linear reduction) │
61
+ └─────────────────────────────────────────────────────────┘
62
+ ```
63
+
64
+ CFG-Free gradually reduces guidance from your configured value (e.g., 7.0) to 0.0 over the final portion of generation. This preserves strong prompt adherence while allowing the model to naturally refine details without over-guidance.
65
+
66
+ ## Configuration
67
+
68
+ ### Parameters
69
+
70
+ | Parameter | Type | Default | Range | Description |
71
+ |-----------|------|---------|-------|-------------|
72
+ | `cfg_free_enabled` | bool | `False` | - | Enable CFG-Free sampling |
73
+ | `cfg_free_start_percent` | float | `70.0` | 0-100 | Percentage of steps at which to start reducing CFG |
74
+
75
+ ### How to Choose `cfg_free_start_percent`
76
+
77
+ The optimal starting point depends on your aesthetic goals:
78
+
79
+ | Start % | Behavior | Best For |
80
+ |---------|----------|----------|
81
+ | **60-65%** | Aggressive reduction, maximum detail preservation | Photorealistic portraits, product photography, architectural renders |
82
+ | **70-75%** | Balanced approach (recommended) | General purpose, landscapes, character art, concept art |
83
+ | **80-85%** | Conservative reduction, maintains stronger guidance | Abstract art, heavily stylized content, complex compositions |
84
+ | **90%+** | Minimal effect, mostly for testing | Debugging, comparing with full-CFG baseline |
85
+
86
+ **Rule of thumb:** Start with 70% for most use cases. If images appear oversaturated or have unnatural sharpness, lower it to 65%. If prompt adherence weakens, raise it to 75-80%.
87
+
88
+ ## Usage
89
+
90
+ ### Streamlit UI
91
+
92
+ Enable in the **🎨 CFG-Free Sampling** expander:
93
+
94
+ 1. Check **Enable CFG-Free Sampling**
95
+ 2. Adjust the **Start Percentage** slider (0-100%, default: 70%)
96
+ 3. The info panel shows exactly when CFG reduction begins
97
+ 4. Generate images — you'll see console logging confirming activation
98
+
99
+ **Visual feedback:**
100
+ ```
101
+ ✓ CFG-Free sampling ACTIVE: CFG will gradually reduce to 0 starting at 70% of steps
102
+ ```
103
+
104
+ ### REST API
105
+
106
+ Include in your generation request:
107
+
108
+ ```bash
109
+ curl -X POST http://localhost:7861/api/generate \
110
+ -H "Content-Type: application/json" \
111
+ -d '{
112
+ "prompt": "a portrait of a woman with flowing hair, soft lighting",
113
+ "negative_prompt": "blurry, low quality",
114
+ "width": 768,
115
+ "height": 1024,
116
+ "steps": 25,
117
+ "cfg_scale": 7.5,
118
+ "cfg_free_enabled": true,
119
+ "cfg_free_start_percent": 70.0
120
+ }'
121
+ ```
122
+
123
+ ### Python API
124
+
125
+ ```python
126
+ from src.user.pipeline import pipeline
127
+
128
+ pipeline(
129
+ prompt="a serene mountain landscape at sunset",
130
+ negative_prompt="blurry, distorted",
131
+ w=1024,
132
+ h=768,
133
+ steps=30,
134
+ sampler="dpmpp_sde_cfgpp",
135
+ scheduler="ays",
136
+ cfg_free_enabled=True,
137
+ cfg_free_start_percent=70.0,
138
+ number=1
139
+ )
140
+ ```
141
+
142
+ ## Quality Impact Analysis
143
+
144
+ ### Visual Improvements
145
+
146
+ CFG-Free sampling produces subtle but meaningful quality improvements:
147
+
148
+ **Before (Standard CFG=7.5):**
149
+ - Sharper edges, sometimes with halos
150
+ - More saturated colors (can appear "painted")
151
+ - Higher contrast, more dramatic lighting
152
+ - Occasionally oversimplified textures
153
+
154
+ **After (CFG-Free from 70%):**
155
+ - Softer, more natural edge transitions
156
+ - Improved color accuracy and tonal range
157
+ - Better fine detail in hair, fabric, skin textures
158
+ - More photorealistic lighting and shadow falloff
159
+ - Reduced artifacts around high-contrast boundaries
160
+
161
+ **Key insight:** Prompt adherence is determined in the first 60-70% of steps. Reducing CFG afterward doesn't weaken composition, it enhances natural detail refinement.
162
+
163
+ ## Troubleshooting
164
+
165
+ ### "Images look washed out or less vibrant"
166
+
167
+ **Cause:** CFG-Free starting too early (e.g., 50-60%) can over-reduce guidance.
168
+
169
+ **Solutions:**
170
+ - Increase `cfg_free_start_percent` to 70-75%
171
+ - Slightly increase base `cfg_scale` to 8.0-8.5
172
+ - Use a different sampler (try `dpmpp_sde_cfgpp` or `dpmpp_2m_cfgpp`)
173
+
174
+ ### "No visible difference from standard CFG"
175
+
176
+ **Cause:** Differences are subtle and may be masked by:
177
+ - Very simple prompts (single subject, plain background)
178
+ - Low resolution (<512px in any dimension)
179
+ - Aggressive other optimizations obscuring quality gains
180
+
181
+ **Solutions:**
182
+ - Test with complex prompts (portraits, detailed scenes)
183
+ - Use higher resolutions (768px+ recommended)
184
+ - Generate comparison images side-by-side with CFG-Free on/off
185
+ - Try lower `cfg_free_start_percent` (60-65%) for more noticeable effect
186
+
187
+ ### "Prompt adherence weakened"
188
+
189
+ **Cause:** CFG-Free starting too early for your particular prompt complexity.
190
+
191
+ **Solutions:**
192
+ - Increase `cfg_free_start_percent` to 75-80%
193
+ - Use stronger base `cfg_scale` (8.0-9.0)
194
+ - Increase step count to 30-35 for better convergence
195
+
196
+ ## Technical Details
197
+
198
+ ### Implementation
199
+
200
+ CFG-Free is implemented in the `CFGGuider` class (`src/sample/CFG.py`):
201
+
202
+ ```python
203
+ def _update_cfg_for_sigma(self, sigma):
204
+ """Update CFG value based on current sigma and CFG-free parameters."""
205
+ if not self.cfg_free_enabled:
206
+ return
207
+
208
+ # Find current step position in schedule
209
+ current_step = find_closest_sigma_index(sigma, self.sigmas)
210
+ total_steps = len(self.sigmas) - 1
211
+ progress_percent = (current_step / total_steps) * 100.0
212
+
213
+ if progress_percent >= self.cfg_free_start_percent:
214
+ # Linear interpolation from original CFG to 0
215
+ cfg_free_progress = (
216
+ (progress_percent - self.cfg_free_start_percent) /
217
+ (100.0 - self.cfg_free_start_percent)
218
+ )
219
+ self.cfg = self.original_cfg * (1.0 - cfg_free_progress)
220
+ self.cfg = max(0.0, self.cfg) # Clamp to [0, original_cfg]
221
+ ```
222
+
223
+ **Schedule visualization:**
224
+
225
+ ```
226
+ CFG Scaling Over Time (start_percent=70%, original_cfg=7.5)
227
+
228
+ Step: 0 5 10 15 20 25 30 35 40 45 50
229
+ ┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
230
+ CFG: 7.5 7.5 7.5 7.5 7.5 7.5 7.5 5.6 3.8 1.9 0.0
231
+ │■■■■■■■■■■■■■■■■■■■■■■■■│▓▓▓▓▓▓▓▓▓▓│░░░░│ │ │
232
+ └────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
233
+ ←──── Full CFG ────────→←─── Gradual Reduction ───→
234
+ ```
235
+
236
+ ### Mathematical Formulation
237
+
238
+ Standard CFG at every step:
239
+
240
+ $$
241
+ \mathbf{x}_{t-1} = \mathbf{x}_t + \text{cfg\_scale} \times (\mathbf{cond} - \mathbf{uncond})
242
+ $$
243
+
244
+ CFG-Free with schedule:
245
+
246
+ $$
247
+ \mathbf{x}_{t-1} = \mathbf{x}_t + \text{cfg}(t) \times (\mathbf{cond} - \mathbf{uncond})
248
+ $$
249
+
250
+ Where:
251
+
252
+ $$
253
+ \text{cfg}(t) = \begin{cases}
254
+ \text{cfg\_scale} & \text{if } t < t_{\text{start}} \\
255
+ \text{cfg\_scale} \times \left(1 - \frac{t - t_{\text{start}}}{t_{\text{total}} - t_{\text{start}}}\right) & \text{if } t \geq t_{\text{start}}
256
+ \end{cases}
257
+ $$
258
+
259
+ ## Related Optimizations
260
+
261
+ - **[CFG++ Samplers](optimizations.md#cfg-samplers)**: Advanced CFG implementation with momentum and multi-scale — CFG-Free complements these
262
+ - **[Multi-Scale Diffusion](optimizations.md#multi-scale)**: Resolution-based optimization — works independently of CFG-Free
263
+ - **[DeepCache](wavespeed.md#deepcache)**: Feature caching for speedup — no quality interaction with CFG-Free
264
+
265
+ ## References & Further Reading
266
+
267
+ - Original research: CFG-Free sampling builds on insights from [Classifier-Free Guidance](https://arxiv.org/abs/2207.12598) (Ho & Salimans, 2022)
268
+ - Implementation inspired by community experiments with dynamic CFG schedules
269
+ - Mathematical framework adapted from diffusion model literature
docs/contributing.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Contributing
2
+
3
+ Thanks for helping push LightDiffusion-Next forward! This project blends a Streamlit UI, a Gradio deployment surface, a FastAPI queue and a sizeable inference stack. The guidelines below should get you productive quickly.
4
+
5
+ ## Getting your environment ready
6
+
7
+ ### Prerequisites
8
+
9
+ - Python 3.10 (the bundled wheels and Stable-Fast extension are built against 3.10)
10
+ - NVIDIA GPU with CUDA 12.1+ drivers (for GPU development)
11
+ - Git (with LFS if you plan to version large model weights)
12
+ - `uv` or `pip` for dependency management
13
+ - Optional: Docker + NVIDIA Container Toolkit for containerized testing
14
+
15
+ ### Clone & install
16
+
17
+ ```fish
18
+ git clone https://github.com/Aatricks/LightDiffusion-Next.git
19
+ cd LightDiffusion-Next
20
+
21
+ # Recommended: isolate dependencies
22
+ python -m venv .venv
23
+ source .venv/bin/activate
24
+
25
+ # Install runtime dependencies
26
+ uv pip install -r requirements.txt
27
+
28
+ # (Optional) Extras for docs and linting
29
+ uv pip install mkdocs mkdocs-material mkdocstrings-python ruff black
30
+ ```
31
+
32
+ Populate `include/` with the checkpoints you need (SD1.5, Flux, LoRAs, embeddings). The UI will prompt you for missing assets if you skip this step.
33
+
34
+ ## Running the apps locally
35
+
36
+ - **Streamlit UI**: `streamlit run streamlit_app.py`
37
+ - **Gradio UI**: `python app.py`
38
+ - **FastAPI backend**: `uvicorn server:app --host 0.0.0.0 --port 7861`
39
+
40
+ All services read the same configuration and model directories. When working on the pipeline, it’s handy to keep FastAPI running for quick REST smoke tests while you iterate on the UI in a separate terminal.
41
+
42
+ ## Workflow expectations
43
+
44
+ 1. Create a branch per piece of work: `git checkout -b feature/short-summary`.
45
+ 2. Keep pull requests focused—avoid bundling unrelated refactors with feature work.
46
+ 3. Reference issues in your commit messages and PR description when applicable.
47
+ 4. Update documentation (`docs/`, `README.md`) whenever behavior, defaults or environments change.
48
+
49
+ ## Coding standards
50
+
51
+ - Follow PEP 8 for Python. If you have `ruff` or `black` installed, run them before committing (`ruff check src ui` and `black src ui`).
52
+ - Prefer type hints for new modules; FastAPI schemas and pipeline helpers already use Pydantic models you can extend.
53
+ - Favor dependency injection over global state—pass configuration into functions where feasible so the FastAPI worker and Streamlit UI stay in sync.
54
+ - When touching CUDA or kernel build logic, document the change in `docs/quirks.md` or `docs/installation.md` so operators know about new requirements.
55
+
56
+ ## Verification checklist
57
+
58
+ Before opening a pull request:
59
+
60
+ - [ ] `streamlit run streamlit_app.py` starts without stack traces.
61
+ - [ ] `uvicorn server:app --host 0.0.0.0 --port 7861` accepts at least one `/api/generate` call (you can use the example payload in [API docs](api.md)).
62
+ - [ ] `python app.py` (Gradio) loads when relevant to your change.
63
+ - [ ] `mkdocs build` succeeds (documentation stays green).
64
+ - [ ] GPU-specific changes are tested on at least one real GPU and noted in the PR description.
65
+ - [ ] No large binaries or secrets are committed—place models inside `include/` and gitignore keeps them local.
66
+
67
+ If you add scripts or automation, include instructions in `docs/examples.md` or a new page and wire it into `mkdocs.yml`.
68
+
69
+ ## Submitting your PR
70
+
71
+ - Fill out a concise description covering **what changed**, **why**, and any ops impact (new env vars, caches, etc.).
72
+ - Attach screenshots or sample renders when altering the UI or pipeline defaults.
73
+ - Expect friendly but thorough reviews—batching, caching and GPU tweaks affect many users, so be ready to iterate.
74
+ - Squash-merge is fine, but avoid force-pushing after reviews unless you coordinate with the maintainer.
75
+
76
+ ## Bug reports & feature requests
77
+
78
+ When reporting an issue, please include:
79
+
80
+ - Operating system, driver versions (`nvidia-smi` output), GPU model
81
+ - How you launched LightDiffusion-Next (Streamlit, Docker, FastAPI)
82
+ - Relevant logs (`logs/server.log`, Streamlit terminal output, `/api/telemetry` response)
83
+ - Steps to reproduce and whether the problem is reproducible on a fresh checkout
84
+
85
+ Feature ideas are welcome—outline the use case, expected UX and any new dependencies (models, GPU requirements). Discussions and prototypes in separate branches make reviews easier.
86
+
87
+ ## Documentation contributions
88
+
89
+ - Run `mkdocs serve` while editing to preview changes at http://127.0.0.1:8080.
90
+ - Add new pages under `docs/` and update `mkdocs.yml` navigation.
91
+ - Screenshots should be optimized PNGs or WebPs stored under `docs/images/`.
92
+ - Keep `README.md` focused on quick start—you can link to richer docs pages for details.
93
+
94
+ Thanks again for contributing! 🚀
docs/examples.md ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Recipes & Workflows
2
+
3
+ This page collects practical “recipes” for common LightDiffusion-Next scenarios. Each section lists the UI path, optional CLI equivalents and tips for squeezing the best quality or performance out of the pipeline.
4
+
5
+ ## 1. Classic text-to-image (SD1.5)
6
+
7
+ Steps in the Streamlit UI:
8
+
9
+ 1. Enter a prompt such as `a cozy reading nook lit by neon signs, cinematic lighting, ultra detailed`.
10
+ 2. Leave negative prompt empty to use the curated default (includes `EasyNegative` and `badhandv4`).
11
+ 3. Set width and height to `768 × 512` and request `4` images with a batch size of `2`.
12
+ 4. Enable **Keep models in VRAM** for faster iteration while exploring.
13
+ 5. (Optional) Toggle **Enhance prompt** if you have Ollama running.
14
+ 6. Click **Generate** — watch the TAESD previews update in real time.
15
+
16
+ CLI equivalent:
17
+
18
+ ```bash
19
+ python -m src.user.pipeline "a cozy reading nook lit by neon signs" 768 512 4 2 --stable-fast --reuse-seed
20
+ ```
21
+
22
+ Tips:
23
+
24
+ - For softer lighting turn on **AutoHDR** (enabled by default) and lower CFG to 6.5 using the advanced settings drawer.
25
+ - Combine with **LoRA** adapters by placing `.safetensors` files in `include/loras/` and selecting them in the UI dropdown.
26
+
27
+ ## 2. Flux workflow
28
+
29
+ Flux requires the quantized GGUF UNet, CLIP, T5 weights and the schnELL VAE (`include/vae/ae.safetensors`). The first run downloads them automatically.
30
+
31
+ 1. Toggle **Flux mode**.
32
+ 2. Switch CFG to `1.0` (Flux expects low CFG) and set steps to around 20.
33
+ 3. Provide a natural language prompt such as `a charcoal sketch of a train arriving at midnight, expressive strokes`.
34
+ 4. Generate 2 images with batch size 1.
35
+
36
+ REST API example:
37
+
38
+ ```bash
39
+ curl -X POST http://localhost:7861/api/generate \
40
+ -H "Content-Type: application/json" \
41
+ -d '{
42
+ "prompt": "a charcoal sketch of a train arriving at midnight, expressive strokes",
43
+ "width": 832,
44
+ "height": 1216,
45
+ "num_images": 2,
46
+ "flux_enabled": true,
47
+ "keep_models_loaded": true
48
+ }' | jq '.images[0]' -r | base64 -d > flux.png
49
+ ```
50
+
51
+ Tips:
52
+
53
+ - Flux ignores negative prompts and uses natural language weighting. Seed reuse works the same way as SD1.5.
54
+ - Monitor GPU memory in the **Model Cache Management** accordion — Flux models are larger.
55
+
56
+ ## 3. HiRes Fix + ADetailer portrait
57
+
58
+ 1. Choose a prompt such as `portrait of a cyberpunk detective, glowing tattoos, rain-soaked alley`.
59
+ 2. Set `width = 640`, `height = 896`, **num images = 1**.
60
+ 3. Enable **HiRes Fix**, **ADetailer** and **Stable-Fast**.
61
+ 4. In the advanced section set **HiRes denoise** to ~0.45 by editing `config.toml` (or accept the default and adjust later).
62
+ 5. Generate — the pipeline saves the base render, body detail pass and head detail pass separately.
63
+
64
+ Where to find outputs:
65
+
66
+ - Base image: `output/HiresFix/`.
67
+ - Body/head detail passes: `output/Adetailer/`.
68
+
69
+ Tips:
70
+
71
+ - Provide a short negative prompt that removes “extra limbs” to guide the detector.
72
+ - Use the **History** tab to compare detailer versus base results quickly.
73
+
74
+ ## 4. Img2Img upscaling with Ultimate SD Upscale
75
+
76
+ 1. Enable **Img2Img mode** and upload your reference image.
77
+ 2. Set denoise strength via the slider in the Img2Img accordion (`0.3` is a good starting point).
78
+ 3. Toggle **Stable-Fast** for faster tile processing and keep CFG around 6.
79
+ 4. Generate. UltimateSDUpscale will split the image into tiles, run targeted refinement and apply RealESRGAN (`include/ESRGAN/RealESRGAN_x4plus.pth`).
80
+
81
+ Tips:
82
+
83
+ - For stylized upscales change the prompt between passes — the pipeline will regenerate details without overwriting the original.
84
+ - Outputs land in `output/Img2Img/` with metadata including seam-fixing parameters.
85
+
86
+ ## 5. Automated batch via REST API
87
+
88
+ Use the FastAPI backend when you need to process multiple prompts from scripts or a Discord bot.
89
+
90
+ ```python
91
+ import base64
92
+ import json
93
+ import requests
94
+
95
+ payload = {
96
+ "prompt": "sunrise over a foggy fjord, volumetric light, ethereal",
97
+ "negative_prompt": "low quality, blurry",
98
+ "width": 832,
99
+ "height": 512,
100
+ "num_images": 3,
101
+ "batch_size": 3,
102
+ "stable_fast": True,
103
+ "reuse_seed": False,
104
+ "enable_preview": False
105
+ }
106
+
107
+ resp = requests.post("http://localhost:7861/api/generate", json=payload)
108
+ resp.raise_for_status()
109
+ images = resp.json().get("images", [])
110
+ for idx, b64_img in enumerate(images):
111
+ with open(f"fjord_{idx+1}.png", "wb") as f:
112
+ f.write(base64.b64decode(b64_img))
113
+ ```
114
+
115
+ The queue automatically coalesces compatible requests to maximize GPU utilization. Check `/api/telemetry` for batching statistics and memory usage.
116
+
117
+ ## 6. Discord bot bridge
118
+
119
+ Combine LightDiffusion-Next with the [Boubou](https://github.com/Aatrick/Boubou) Discord bot:
120
+
121
+ 1. Follow the bot’s README to set your Discord token and install `py-cord` inside the LightDiffusion environment.
122
+ 2. Point the bot’s configuration at the FastAPI endpoint (`http://localhost:7861`).
123
+ 3. Give the bot `Send Messages` and `Attach Files` permissions.
124
+ 4. Use commands such as `/ld prompt:"a watercolor koi pond"` from your server and watch images stream back into the channel.
125
+
126
+ ## 7. Prompt enhancer playground
127
+
128
+ 1. Install [Ollama](https://ollama.com/) and run `ollama serve` in another terminal.
129
+ 2. Pull the suggested model:
130
+
131
+ ```bash
132
+ ollama pull qwen3:0.6b
133
+ ```
134
+
135
+ 3. Export the model name before launching the UI:
136
+
137
+ ```bash
138
+ export PROMPT_ENHANCER_MODEL=qwen3:0.6b
139
+ ```
140
+
141
+ 4. Enable **Enhance prompt** in Streamlit and inspect the rewritten prompt under the preview section. The original text is still stored as `original_prompt` inside PNG metadata.
142
+
143
+ Continue exploring by reading the [performance & tuning](quirks.md) guide or the [REST documentation](api.md) for full endpoint details.
docs/faq.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # FAQ
2
+
3
+ Q: Where do I put my checkpoints?
4
+
5
+ A: Put them in `include/checkpoints` (create the folder if missing). The UI and `src/FileManaging/Loader` will detect and list them.
6
+
7
+ Q: Why is GPU memory insufficient?
8
+
9
+ A: Try reducing `width`/`height`, turning off `keep models loaded`, or enable quantized Flux/GGUF models. See [Performance & Troubleshooting](quirks.md).
10
+
11
+ Q: Can I run headless on a server?
12
+
13
+ A: Yes — use the FastAPI backend with `docker-compose` or run `server.py` directly. Disable Streamlit if you don’t need the web UI.
14
+
15
+ Q: How do I contribute models or LoRAs?
16
+
17
+ A: Place LoRA files in `include/loras` and embeddings in `include/embeddings`. See [Contributing](contributing.md) for guidelines.
18
+
19
+ /// details | Which diffusion models are supported out of the box?
20
+ LightDiffusion-Next ships with Stable Diffusion 1.5-friendly defaults and includes helpers for SDXL-inspired checkpoints, Flux (via the `include/Flux` assets) and quantized Stable-Fast backends. Drop your `.safetensors` or `.ckpt` files into `include/checkpoints`, LoRAs into `include/loras`, embeddings into `include/embeddings`, and Flux weights into `include/Flux`. The loader auto-detects formats and will prompt for missing companions (VAE, CLIP) at startup.
21
+ ///
22
+
23
+ /// details | What GPU and driver versions do I need?
24
+ NVIDIA GPUs with CUDA 12.1+ drivers are recommended. Availability of Stable-Fast, SageAttention and SpargeAttn depends on your installed kernels, drivers and GPU compute capability — the runtime detects and enables compatible backends automatically. For Docker, install the NVIDIA Container Toolkit and verify `nvidia-smi` works inside the container.
25
+ ///
26
+
27
+ /// details | Can I run LightDiffusion-Next without a GPU?
28
+ Yes, but performance will be limited. Install CPU wheels of PyTorch or rely on the bundled Intel oneAPI runtime (Linux only). Disable Stable-Fast/SageAttention in settings, reduce resolution (≤384×384), lower steps (<20) and turn off AutoDetailer/HiResFix to avoid minute-long renders.
29
+ ///
30
+
31
+ /// details | Where do generated images and metadata live?
32
+ Outputs are grouped by workflow under `output/`. For example, standard Txt2Img lands in `output/classic`, HiresFix into `output/HiresFix`, Flux into `output/Flux`, Img2Img upscales into `output/Img2Img`, etc. Each PNG embeds prompt metadata; accompanying JSON manifests are saved when enabled in settings.
33
+ ///
34
+
35
+ /// details | How do I switch between Streamlit, Gradio and the API?
36
+ Use the launch scripts:
37
+
38
+ - `streamlit run streamlit_app.py` (default UI)
39
+ - `python app.py` (Gradio app for Spaces/remote hosting)
40
+ - `uvicorn server:app --host 0.0.0.0 --port 7861` (FastAPI)
41
+
42
+ All three share the same pipeline and config. Streamlit/Gradio speak directly to the pipeline, while the API feeds the batching queue in `server.py`.
43
+ ///
44
+
45
+ /// details | How do I enable Stable-Fast or SageAttention?
46
+
47
+ In Streamlit, toggle **Stable-Fast** under *Performance*. The app will compile kernels the first time and reuse them afterwards (cache in `~/.cache/torch_extensions`). SageAttention is enabled automatically on supported GPUs; you can force-disable it by setting `LD_DISABLE_SAGE_ATTENTION=1` before launching. Docker images already ship with the patched kernels compiled.
48
+ ///
49
+
50
+ /// details | What if the app says a model is missing?
51
+
52
+ The downloader checks `include/` on startup and whenever a feature needs a new asset (YOLO, Flux, TAESD). Provide URLs or Hugging Face tokens when prompted, or pre-populate the folders manually. For offline environments, copy the files into the correct directories and ensure filenames match the expected suffixes (e.g., `anything-v4.5-pruned.safetensors`).
53
+ ///
54
+
55
+ /// details | Can I enhance prompts automatically with Ollama?
56
+
57
+ Yes. Install Ollama locally, download a language model (`ollama run mistral`), then enable **Prompt Enhancer** in the UI or set `enhance_prompt=true` in the REST payload. Set `OLLAMA_BASE_URL` if Ollama is not on `http://localhost:11434`.
58
+ ///
59
+
60
+ /// details | How do I reset persistent settings or history?
61
+
62
+ Delete `webui_settings.json` in the project root to reset saved toggles and defaults. Remove individual history directories under `ui/history/` to clear the UI gallery without touching generated images.
63
+ ///
64
+
65
+ /// details | Need more help?
66
+
67
+ Check the [Troubleshooting guide](quirks.md) or [open an issue](https://github.com/Aatricks/LightDiffusion-Next/issues) with logs, hardware specs and steps to reproduce.
68
+ ///
docs/implemented-optimizations-report.md ADDED
@@ -0,0 +1,484 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Implemented Optimizations Report
2
+
3
+ This document presents a source-based engineering report on the optimization stack used across generation, model loading, and serving in LightDiffusion-Next.
4
+
5
+ Unlike the overview pages:
6
+
7
+ - The source tree is treated as the primary reference point.
8
+ - Each optimization is described in terms of purpose, implementation, integration, and trade-offs.
9
+ - Supporting infrastructure and codebase groundwork are included when they materially contribute to the performance profile of the project.
10
+
11
+ ## Report Scope
12
+
13
+ ### Usage Profile Definitions
14
+
15
+ - `default`: selected in the standard execution path
16
+ - `integrated`: part of the current generation or serving flow
17
+ - `optional`: integrated, but enabled through request settings, configuration, or model capabilities
18
+ - `conditional`: available when hardware, dependencies, or runtime capabilities allow it
19
+ - `implementation-specific`: implemented and used, but its effective behavior is shaped by a narrower internal path than the request surface alone suggests
20
+ - `infrastructure-level`: supports the fast path indirectly through loading, transfer, caching, or serving behavior
21
+ - `codebase groundwork`: implemented in the codebase as part of the optimization stack, but not yet surfaced as a broad standard pipeline option
22
+
23
+ ### What This Report Covers
24
+
25
+ This report covers both model-level and system-level optimizations:
26
+
27
+ - inference and sampling speedups
28
+ - precision and memory reductions
29
+ - request batching and pipeline throughput improvements
30
+ - preview and output-path latency reductions
31
+
32
+ It does not catalog ordinary features unless they clearly reduce compute, memory, or end-to-end latency.
33
+
34
+ ## Quick Inventory
35
+
36
+ | Optimization | Usage Profile | Main Goal | Primary Evidence |
37
+ |---|---|---|---|
38
+ | CUDA runtime tuning (TF32, cuDNN benchmark, SDPA enablement) | integrated, conditional | faster kernels and better backend selection | `src/Device/Device.py` |
39
+ | Attention backend cascade (SpargeAttn/SageAttention/xformers/SDPA) | integrated, conditional | faster attention kernels with fallback | `src/Attention/Attention.py`, `src/Attention/AttentionMethods.py` |
40
+ | Flux2 SDPA backend priority | integrated, conditional | prefer cuDNN/Flash SDPA for Flux2 attention | `src/NeuralNetwork/flux2/layers.py`, `src/Device/Device.py` |
41
+ | Cross-attention K/V projection cache | integrated | skip repeated key/value projection work for static context | `src/Attention/Attention.py` |
42
+ | Prompt embedding cache | integrated | avoid re-encoding repeated prompts | `src/Utilities/prompt_cache.py`, `src/clip/Clip.py` |
43
+ | Conditioning batch packing and memory-aware concatenation | integrated | reduce forward passes and pack compatible condition chunks | `src/cond/cond.py` |
44
+ | CFG=1 unconditional-skip fast path | integrated | skip unnecessary unconditional branch at CFG 1.0 | `src/sample/CFG.py`, `src/sample/BaseSampler.py` |
45
+ | AYS scheduler | default | reach similar quality in fewer steps | `src/sample/ays_scheduler.py`, `src/sample/ksampler_util.py` |
46
+ | CFG++ samplers | integrated | improve denoising behavior with momentum-style correction | `src/sample/BaseSampler.py` |
47
+ | CFG-Free sampling | integrated, optional | taper CFG late in sampling for better detail/naturalness | `src/sample/CFG.py` |
48
+ | Dynamic CFG rescaling | integrated, optional | reduce overshoot and saturation from strong CFG | `src/sample/CFG.py` |
49
+ | Adaptive noise scheduling | integrated, optional | adjust schedule based on observed complexity | `src/sample/CFG.py` |
50
+ | `batched_cfg` request surface | implementation-specific | request-facing control around the deeper conditioning batching path | `src/sample/sampling.py`, `src/cond/cond.py` |
51
+ | Multi-scale latent switching | integrated, optional | do some denoising at reduced spatial resolution | `src/sample/BaseSampler.py` |
52
+ | HiDiffusion MSW-MSA patching | integrated, optional | patch UNet attention for high-resolution multiscale workflows | `src/Core/Pipeline.py`, `src/hidiffusion/msw_msa_attention.py` |
53
+ | Stable-Fast | integrated, conditional | trace/compile UNet forward path | `src/StableFast/StableFast.py`, `src/Core/Pipeline.py` |
54
+ | `torch.compile` | integrated, optional | compiler-based model speedup without Stable-Fast | `src/Device/Device.py`, `src/Core/AbstractModel.py` |
55
+ | VAE compile, tiled path, and transfer tuning | integrated | speed up decode/encode and avoid OOM | `src/AutoEncoders/VariationalAE.py` |
56
+ | BF16/FP16 automatic dtype selection | integrated, conditional | reduce memory and improve throughput on supported hardware | `src/Device/Device.py` |
57
+ | FP8 weight quantization | integrated, conditional | reduce weight memory and enable Flux2-friendly inference paths | `src/Core/AbstractModel.py`, `src/Model/ModelPatcher.py` |
58
+ | NVFP4 weight quantization | integrated, optional | stronger memory reduction than FP8 | `src/Core/AbstractModel.py`, `src/Model/ModelPatcher.py`, `src/Utilities/Quantization.py` |
59
+ | Flux2 load-time weight-only quantization | integrated, conditional | keep large Flux2/Klein components workable on smaller VRAM budgets | `src/Core/Models/Flux2KleinModel.py` |
60
+ | ToMe | integrated, optional | reduce attention cost by token merging on UNet models | `src/Model/ModelPatcher.py`, `src/Core/Pipeline.py` |
61
+ | DeepCache | integrated, optional, implementation-specific | reuse prior denoiser output between update steps | `src/WaveSpeed/deepcache_nodes.py`, `src/Core/Pipeline.py` |
62
+ | First Block Cache for Flux | codebase groundwork | cache transformer work for Flux-like models | `src/WaveSpeed/first_block_cache.py` |
63
+ | Low-VRAM partial loading and offload policy | integrated | load only what fits and offload the rest | `src/cond/cond_util.py`, `src/Device/Device.py`, `src/Model/ModelPatcher.py` |
64
+ | Async transfer helpers and pinned checkpoint tensors | integrated, infrastructure-level | reduce host/device transfer overhead | `src/Device/Device.py`, `src/Utilities/util.py` |
65
+ | Request coalescing and queue batching | integrated | increase throughput across compatible API requests | `server.py` |
66
+ | Large-group chunking and image-save guardrails | integrated | keep large coalesced runs from blowing up save/decode paths | `server.py`, `src/FileManaging/ImageSaver.py` |
67
+ | Next-model prefetch | integrated | hide future checkpoint load latency | `server.py`, `src/Device/ModelCache.py`, `src/Utilities/util.py` |
68
+ | Keep-models-loaded cache | integrated | reuse loaded checkpoints and reduce warm starts | `src/Device/ModelCache.py`, `server.py` |
69
+ | In-memory PNG byte buffer | integrated | avoid disk round-trip for API responses | `src/FileManaging/ImageSaver.py`, `server.py` |
70
+ | TAESD preview pacing and preview fidelity control | integrated, conditional | reduce preview overhead while keeping live feedback usable | `src/sample/BaseSampler.py`, `src/AutoEncoders/taesd.py`, `server.py` |
71
+
72
+ ## Executive Summary
73
+
74
+ The optimization strategy in LightDiffusion-Next is layered and cumulative rather than dependent on a single acceleration mechanism.
75
+
76
+ 1. The core generation path combines runtime kernel selection, conditioning batching, lower-precision execution, and schedule optimization.
77
+ 2. Several optimizations are part of the standard execution path, most notably AYS scheduling, prompt caching, attention backend selection, low-VRAM loading policy, and server-side request grouping.
78
+ 3. A second layer of optional mechanisms provides workload-specific extensions, including Stable-Fast, `torch.compile`, ToMe, multiscale sampling, quantization, and guidance refinements such as CFG-Free and dynamic rescaling.
79
+ 4. The serving layer contributes materially to end-to-end throughput and latency through request coalescing, chunking, model prefetching, keep-loaded caching, and in-memory response handling.
80
+ 5. The codebase also contains foundational work for additional caching paths, particularly around Flux-oriented first-block caching, alongside the currently integrated DeepCache path.
81
+
82
+ ## Runtime And Attention Optimizations
83
+
84
+ ### CUDA runtime tuning
85
+
86
+ - Status: `integrated, conditional`
87
+ - Purpose: use faster math modes and let the backend choose more aggressive convolution and attention kernels.
88
+ - Implementation in LightDiffusion-Next: `src/Device/Device.py` enables TF32 (`torch.backends.cuda.matmul.allow_tf32`, `torch.backends.cudnn.allow_tf32`), enables cuDNN benchmarking, and turns on PyTorch math/flash/memory-efficient SDPA when available.
89
+ - Project integration: these are process-wide defaults. They do not require per-request toggles, so supported CUDA deployments get them automatically.
90
+ - Effect: reduces matmul/convolution cost and opens better SDPA backends with no extra application-layer work.
91
+ - Benefits: automatic, broad coverage, low complexity.
92
+ - Trade-offs: hardware-conditional; benefits depend on GPU generation and PyTorch build.
93
+ - Evidence: `src/Device/Device.py`.
94
+
95
+ ### Attention backend cascade: SpargeAttn, SageAttention, xformers, PyTorch SDPA
96
+
97
+ - Status: `integrated, conditional`
98
+ - Purpose: use the fastest available attention kernel and fall back safely when unsupported.
99
+ - Implementation in LightDiffusion-Next: UNet/VAE attention chooses `SpargeAttn > SageAttention > xformers > PyTorch` in `src/Attention/Attention.py`; the concrete kernels and fallback behavior live in `src/Attention/AttentionMethods.py`.
100
+ - Project integration: the selection happens once when the attention module is imported/constructed. Sage/Sparge paths reshape inputs to HND layouts and pad unsupported head sizes to supported dimensions where possible; larger unsupported head sizes fall back.
101
+ - Effect: faster attention on supported CUDA systems without changing calling code.
102
+ - Benefits: automatic fallback chain, works across UNet cross-attention and VAE attention blocks, handles padding for awkward head sizes.
103
+ - Trade-offs: dependency- and GPU-dependent; not all head sizes stay on the fast path; behavior differs between generic UNet/VAE attention and Flux2 attention.
104
+ - Evidence: `src/Attention/Attention.py`, `src/Attention/AttentionMethods.py`.
105
+
106
+ ### Flux2 SDPA backend priority
107
+
108
+ - Status: `integrated, conditional`
109
+ - Purpose: prefer the best PyTorch SDPA backend for Flux2 transformer attention.
110
+ - Implementation in LightDiffusion-Next: `src/Device/Device.py` builds an SDPA priority context preferring cuDNN attention, then Flash, then efficient, then math; `src/NeuralNetwork/flux2/layers.py` uses `Device.get_sdpa_context()` around `scaled_dot_product_attention`.
111
+ - Project integration: Flux2 uses a separate attention implementation from the generic UNet attention path. It first tries prioritized SDPA, then xformers, then plain SDPA.
112
+ - Effect: prioritized fast attention for Flux2 with robust fallback behavior.
113
+ - Benefits: keeps Flux2 on the most optimized native backend available; does not require custom kernels.
114
+ - Trade-offs: benefits depend heavily on PyTorch version, backend support, and GPU runtime.
115
+ - Evidence: `src/Device/Device.py`, `src/NeuralNetwork/flux2/layers.py`.
116
+
117
+ ### Cross-attention static K/V projection cache
118
+
119
+ - Status: `integrated`
120
+ - Purpose: when the context tensor is unchanged across denoising steps, avoid recomputing K/V projections every step.
121
+ - Implementation in LightDiffusion-Next: `CrossAttention` in `src/Attention/Attention.py` keeps a small `_context_cache` keyed by `id(context)` and caches projected `k` and `v`.
122
+ - Project integration: this primarily targets prompt-conditioning cases where context is static while the latent evolves. The cache is tiny and self-pruning.
123
+ - Effect: shaves repeated linear-projection work from cross-attention-heavy denoising loops.
124
+ - Benefits: simple, training-free, no user configuration.
125
+ - Trade-offs: keyed by object identity, so it only helps when the exact context object is reused; small cache size limits reuse breadth.
126
+ - Evidence: `src/Attention/Attention.py`.
127
+
128
+ ### Prompt embedding cache
129
+
130
+ - Status: `integrated`
131
+ - Purpose: cache text encoder outputs for repeated prompts instead of re-encoding them each time.
132
+ - Implementation in LightDiffusion-Next: `src/Utilities/prompt_cache.py` stores `(cond, pooled)` entries keyed by prompt hash and CLIP identity; `src/clip/Clip.py` checks the cache before tokenization/encoding and writes back after encode.
133
+ - Project integration: prompt caching is globally enabled by default, applies to single prompts and prompt lists, and prunes old entries once the cache exceeds its configured maximum.
134
+ - Effect: reduces prompt-side overhead in repeated-prompt workflows, especially seed sweeps and incremental prompt refinement.
135
+ - Benefits: low complexity, wired into the actual CLIP encode path, no quality trade-off.
136
+ - Trade-offs: cache size is estimate-based and global, not per-model-session aware.
137
+ - Evidence: `src/Utilities/prompt_cache.py`, `src/clip/Clip.py`, cache clear hook in `src/Core/Pipeline.py`.
138
+
139
+ ### Conditioning batch packing and CFG=1 fast path
140
+
141
+ - Status: `integrated`
142
+ - Purpose: concatenate compatible conditioning work into fewer forward calls, and skip unconditional work entirely when CFG is effectively disabled.
143
+ - Implementation in LightDiffusion-Next: `src/cond/cond.py::calc_cond_batch()` groups compatible condition chunks by shape and memory budget, concatenates them, and falls back per chunk when transformer options mismatch. `src/sample/CFG.py` sets `uncond_ = None` when `cond_scale == 1.0` and the optimization is not disabled.
144
+ - Project integration: this path is central to the standard sampling flow. The batching logic also validates Flux-style transformer image sizes and falls back when they do not match token grids.
145
+ - Effect: fewer model invocations, better GPU utilization, and a lower-cost path for CFG=1 workloads.
146
+ - Benefits: real throughput win, memory-aware, includes safety fallback for positional/shape mismatches.
147
+ - Trade-offs: batching heuristics are shape- and memory-sensitive; fallback behavior can reduce speed when conditions diverge.
148
+ - Evidence: `src/cond/cond.py`, `src/sample/CFG.py`, `src/sample/BaseSampler.py`, `tests/unit/test_calc_cond_batch_fallback.py`.
149
+
150
+ ## Sampling And Guidance Optimizations
151
+
152
+ ### AYS scheduler
153
+
154
+ - Status: `default`
155
+ - Purpose: use precomputed sigma schedules that spend steps where they matter most, so fewer steps can reach comparable quality.
156
+ - Implementation in LightDiffusion-Next: schedules are encoded in `src/sample/ays_scheduler.py`; `src/sample/ksampler_util.py` routes `ays`, `ays_sd15`, and `ays_sdxl` to the scheduler and auto-detects model type when possible.
157
+ - Project integration: both `server.py` and `src/user/pipeline.py` default the scheduler to `ays`. Exact schedules are used when present; otherwise the code resamples or interpolates schedules.
158
+ - Effect: fewer denoising steps for similar output quality, especially on SD1.5 and SDXL.
159
+ - Benefits: training-free, defaulted into the request path, compatible with the sampler stack.
160
+ - Trade-offs: produces different trajectories than classic schedulers; unsupported step counts use interpolation rather than paper-derived schedules.
161
+ - Evidence: `src/sample/ays_scheduler.py`, `src/sample/ksampler_util.py`, defaults in `server.py` and `src/user/pipeline.py`, benchmark usage in `tests/benchmark_performance.py`.
162
+
163
+ ### CFG++ samplers
164
+
165
+ - Status: `integrated`
166
+ - Purpose: apply CFG++-style momentum behavior in sampler variants to improve denoising stability and quality.
167
+ - Implementation in LightDiffusion-Next: sampler registry maps `_cfgpp` sampler names to the same sampler classes, and `get_sampler()` enables `use_momentum` whenever the sampler name contains `_cfgpp`.
168
+ - Project integration: the sampler loop stores prior denoised state and applies momentum-style correction through `BaseSampler.apply_cfg()`. The server default sampler is `dpmpp_sde_cfgpp`.
169
+ - Effect: better denoising behavior than plain sampler variants without a separate post-process stage.
170
+ - Benefits: integrated directly into the sampler registry; default sampler already uses it.
171
+ - Trade-offs: only applies on `_cfgpp` variants; behavior is coupled to sampler implementation details rather than being a universal guidance layer.
172
+ - Evidence: `src/sample/BaseSampler.py`, default sampler in `server.py`.
173
+
174
+ ### CFG-Free sampling
175
+
176
+ - Status: `integrated, optional`
177
+ - Purpose: reduce CFG late in the denoising process so the model can finish with less over-guidance.
178
+ - Implementation in LightDiffusion-Next: `CFGGuider` stores `cfg_free_enabled` and `cfg_free_start_percent`, tracks current sigma position, and progressively reduces `self.cfg` once the configured progress threshold is crossed.
179
+ - Project integration: the flag is part of the request/context surface and is forwarded by SD1.5, SDXL, Flux2, HiResFix, and Img2Img code paths.
180
+ - Effect: potentially better detail recovery and more natural late-stage refinement.
181
+ - Benefits: integrated and actually wired through multiple pipelines; easy to combine with the rest of the sampler stack.
182
+ - Trade-offs: quality optimization rather than pure speedup; exact effect is prompt- and sampler-dependent.
183
+ - Evidence: `src/sample/CFG.py`, `src/Core/Models/SD15Model.py`, `src/Core/Models/SDXLModel.py`, `src/Core/Models/Flux2KleinModel.py`, `src/Processors/HiresFix.py`, `src/Processors/Img2Img.py`.
184
+
185
+ ### Dynamic CFG rescaling
186
+
187
+ - Status: `integrated, optional`
188
+ - Purpose: reduce effective CFG when the guidance delta becomes too strong.
189
+ - Implementation in LightDiffusion-Next: `CFGGuider._apply_dynamic_cfg_rescaling()` computes either a variance-based or range-based adjustment and clamps the result.
190
+ - Project integration: it runs inside `cfg_function()` before CFG mixing is finalized, so it affects the real denoising path rather than acting as a post-hoc metric.
191
+ - Effect: reduces oversaturation and over-guided outputs for high-CFG workloads.
192
+ - Benefits: low incremental overhead and direct integration into CFG computation.
193
+ - Trade-offs: not a pure speed optimization; the chosen formulas are heuristic and can flatten outputs if pushed too hard.
194
+ - Evidence: `src/sample/CFG.py`.
195
+
196
+ ### Adaptive noise scheduling
197
+
198
+ - Status: `integrated, optional`
199
+ - Purpose: use observed prediction complexity to perturb the sigma schedule during sampling.
200
+ - Implementation in LightDiffusion-Next: `CFGGuider` records complexity history during prediction and scales `sigmas` inside `inner_sample()` if adaptive mode is enabled.
201
+ - Project integration: complexity can be estimated with a spatial-difference metric or variance-like behavior, depending on the selected method.
202
+ - Effect: attempts to spend effort where the current prediction appears more complex.
203
+ - Benefits: implemented end-to-end in the guider.
204
+ - Trade-offs: heuristic, can alter reproducibility, and its benefit is much less established in this repo than AYS or request coalescing.
205
+ - Evidence: `src/sample/CFG.py`.
206
+
207
+ ### `batched_cfg` request surface
208
+
209
+ - Status: `implementation-specific`
210
+ - Purpose: expose control over conditional/unconditional batching.
211
+ - Implementation in LightDiffusion-Next: the field exists in the request and context models and is passed into sampling, where it is stored in `model_options["batched_cfg"]`.
212
+ - Project integration: the main batching behavior is centered in `calc_cond_batch()`, while `batched_cfg` is carried through `model_options` as part of the request-side control surface around that path.
213
+ - Effect: provides a request-facing handle for a batching path whose heavy lifting is performed centrally in conditioning packing.
214
+ - Benefits: fits cleanly into the existing request and sampling pipeline.
215
+ - Trade-offs: its effect is indirect because the main concatenation behavior is implemented deeper in the conditioning layer.
216
+ - Evidence: `src/sample/sampling.py`, `src/Core/Context.py`, `src/cond/cond.py`.
217
+
218
+ ## Multiscale And Architecture-Specific Optimizations
219
+
220
+ ### Multi-scale latent switching
221
+
222
+ - Status: `integrated, optional`
223
+ - Purpose: run some denoising steps at a downscaled latent resolution and return to full resolution for selected steps.
224
+ - Implementation in LightDiffusion-Next: `MultiscaleManager` in `src/sample/BaseSampler.py` computes a per-step full-resolution schedule and uses bilinear downscale/upscale around sampler model calls.
225
+ - Project integration: the samplers consult `ms.use_fullres(i)` each step. Flux and Flux2 are explicitly excluded because the code treats multiscale as incompatible with DiT-style architectures.
226
+ - Effect: lower compute on some denoising steps for compatible samplers and architectures.
227
+ - Benefits: actually participates in the sampler loop; configurable by factor and schedule.
228
+ - Trade-offs: it necessarily changes the denoising path and can trade detail for speed; not available for Flux/Flux2.
229
+ - Evidence: `src/sample/BaseSampler.py`, `src/sample/sampling.py`, `src/Core/Models/Flux2KleinModel.py`.
230
+
231
+ ### HiDiffusion MSW-MSA patching
232
+
233
+ - Status: `integrated, optional`
234
+ - Purpose: patch UNet attention for high-resolution workflows using HiDiffusion-style MSW-MSA attention changes.
235
+ - Implementation in LightDiffusion-Next: the pipeline clones the inner model and applies `ApplyMSWMSAAttentionSimple` when multiscale is enabled on UNet architectures.
236
+ - Project integration: the patch is explicitly blocked for Flux/Flux2 and disabled in some sub-pipelines like refiner or certain detail passes where the project wants to avoid artifact risk.
237
+ - Effect: makes the multiscale/high-resolution path more efficient or more stable on SD1.5/SDXL-style UNets.
238
+ - Benefits: architecture-aware and guarded against obvious misuse.
239
+ - Trade-offs: not universal; adds another patching layer and can be brittle if architecture assumptions drift.
240
+ - Evidence: `src/Core/Pipeline.py`, `src/hidiffusion/msw_msa_attention.py`, `src/Core/AbstractModel.py`, `src/Core/Models/SD15Model.py`, `src/Core/Models/SDXLModel.py`.
241
+
242
+ ## Model Compilation, Precision, And Memory Optimizations
243
+
244
+ ### Stable-Fast
245
+
246
+ - Status: `integrated, conditional`
247
+ - Purpose: trace and wrap UNet execution to reduce Python overhead and optionally use CUDA graph behavior.
248
+ - Implementation in LightDiffusion-Next: `src/StableFast/StableFast.py` builds a lazy trace module around the model function and stores compiled modules in a cache keyed by converted kwargs; `Pipeline._apply_optimizations()` applies it when `stable_fast` is enabled.
249
+ - Project integration: only model types that advertise `supports_stable_fast=True` can use it. Flux2 explicitly opts out at the capability layer.
250
+ - Effect: faster repeated UNet execution when the optional `sfast` dependency is present and shapes stay compatible enough for compilation reuse.
251
+ - Benefits: capability-gated, optional dependency handled defensively, integrated into the core optimization application phase.
252
+ - Trade-offs: dependency-sensitive, compilation overhead can dominate short runs, CUDA graph behavior is less flexible.
253
+ - Evidence: `src/StableFast/StableFast.py`, `src/Core/Pipeline.py`, `src/Core/Models/SD15Model.py`, `src/Core/Models/SDXLModel.py`, `src/Core/Models/Flux2KleinModel.py`.
254
+
255
+ ### `torch.compile`
256
+
257
+ - Status: `integrated, optional`
258
+ - Purpose: rely on PyTorch compiler paths instead of Stable-Fast.
259
+ - Implementation in LightDiffusion-Next: `src/Device/Device.py::compile_model()` defaults to `max-autotune-no-cudagraphs`; `src/Core/AbstractModel.py::apply_torch_compile()` applies it to the top-level module or diffusion submodule when possible.
260
+ - Project integration: the optimization is mutually exclusive with Stable-Fast in the main pipeline.
261
+ - Effect: compiler-based speedups with a safer default mode than more fragile CUDA-graph-heavy settings.
262
+ - Benefits: built on standard PyTorch, tested for safe default mode.
263
+ - Trade-offs: compiler behavior is environment-dependent; still vulnerable to dynamic-shape and dynamic-state limitations.
264
+ - Evidence: `src/Device/Device.py`, `src/Core/AbstractModel.py`, `src/Core/Pipeline.py`, `tests/unit/test_fp8_compile.py`.
265
+
266
+ ### VAE compile, tiled path, and transfer tuning
267
+
268
+ - Status: `integrated`
269
+ - Purpose: speed up VAE encode/decode, reduce overhead, and avoid OOM by choosing tiled or batched paths.
270
+ - Implementation in LightDiffusion-Next: `VariationalAE.VAE` compiles the decoder on first use, runs decode/encode under `torch.inference_mode()`, uses channels-last where useful, chooses tiled fallback when memory is tight, and uses non-blocking transfers.
271
+ - Project integration: this is automatic. Callers do not opt in.
272
+ - Effect: faster VAE stages, less repeated Python/autograd overhead, and better robustness under constrained memory.
273
+ - Benefits: always enabled and directly applied in the decode and encode hot path.
274
+ - Trade-offs: decoder compile still depends on `torch.compile` availability; tiling adds complexity and can affect throughput at small sizes.
275
+ - Evidence: `src/AutoEncoders/VariationalAE.py`.
276
+
277
+ ### BF16/FP16 automatic dtype selection
278
+
279
+ - Status: `integrated, conditional`
280
+ - Purpose: pick a lower-precision working dtype that matches the hardware and model constraints.
281
+ - Implementation in LightDiffusion-Next: `src/Device/Device.py` contains the dtype selection logic for UNet, text encoder, and VAE devices/dtypes, including bf16 support checks and fallback rules.
282
+ - Project integration: loaders and patchers consult these helpers when deciding how to instantiate and place components.
283
+ - Effect: reduced memory footprint and better arithmetic throughput on modern hardware.
284
+ - Benefits: broad, centralized policy.
285
+ - Trade-offs: heuristic; wrong hardware assumptions can reduce numerical stability or disable a faster path.
286
+ - Evidence: `src/Device/Device.py`, `src/Model/ModelPatcher.py`, `src/FileManaging/Loader.py`.
287
+
288
+ ### FP8 weight quantization
289
+
290
+ - Status: `integrated, conditional`
291
+ - Purpose: store weights in FP8 while casting them back to the input dtype during execution.
292
+ - Implementation in LightDiffusion-Next: `AbstractModel.apply_fp8()` hardware-gates support using `Device.is_fp8_supported()`, rewrites eligible weights to FP8, and enables runtime cast behavior on `CastWeightBiasOp` modules. The lower-level `ModelPatcher.weight_only_quantize()` also supports FP8-style quantization.
293
+ - Project integration: it is available through generation settings and also used in Flux2 load paths when appropriate.
294
+ - Effect: lower model weight memory with an execution path that avoids dtype-mismatch crashes.
295
+ - Benefits: tested explicitly, integrates with cast-aware modules, useful for large models.
296
+ - Trade-offs: hardware-gated; quality/performance trade-offs depend on model and layer mix.
297
+ - Evidence: `src/Core/AbstractModel.py`, `src/Device/Device.py`, `src/Model/ModelPatcher.py`, `tests/unit/test_fp8_compile.py`.
298
+
299
+ ### NVFP4 weight quantization
300
+
301
+ - Status: `integrated, optional`
302
+ - Purpose: use a more aggressive 4-bit weight-only format to reduce memory further than FP8.
303
+ - Implementation in LightDiffusion-Next: both `AbstractModel.apply_nvfp4()` and `ModelPatcher.weight_only_quantize("nvfp4")` quantize supported weights, store scale buffers, and enable runtime casting/dequantization.
304
+ - Project integration: the quantization path is used most clearly in Flux2/Klein loading, but the abstract model path also exists for supported models.
305
+ - Effect: significant memory reduction at the cost of more aggressive approximation.
306
+ - Benefits: strongest memory reduction path in the repo.
307
+ - Trade-offs: more invasive than FP8, more likely to affect quality, and only applies to some weight shapes.
308
+ - Evidence: `src/Core/AbstractModel.py`, `src/Model/ModelPatcher.py`, `src/Utilities/Quantization.py`, `tests/test_nvfp4.py`, `tests/test_nvfp4_integration.py`.
309
+
310
+ ### Flux2 load-time weight-only quantization
311
+
312
+ - Status: `integrated, conditional`
313
+ - Purpose: automatically quantize large Flux2 diffusion and Klein text encoder weights during loading when the configuration or hardware path calls for it.
314
+ - Implementation in LightDiffusion-Next: `Flux2KleinModel.load()` selects a quantization format and applies weight-only quantization to the diffusion model; `_load_klein_text_encoder()` applies the same idea to the text encoder before offloading it back to CPU.
315
+ - Project integration: Flux2 is the clearest example in the codebase where quantization is implemented as a first-class loading strategy rather than as a generic capability alone.
316
+ - Effect: keeps a large Flux2/Klein stack usable on lower-VRAM systems than an uncompressed load would allow.
317
+ - Benefits: integrated, architecture-specific, and directly aligned with large-model VRAM constraints.
318
+ - Trade-offs: tightly coupled to Flux2/Klein assumptions; not equivalent to a universally available quantized-mode toggle.
319
+ - Evidence: `src/Core/Models/Flux2KleinModel.py`.
320
+
321
+ ### ToMe
322
+
323
+ - Status: `integrated, optional`
324
+ - Purpose: merge similar tokens to reduce attention workload in UNet-based models.
325
+ - Implementation in LightDiffusion-Next: `ModelPatcher.apply_tome()` applies and removes `tomesd` patches; `Pipeline._apply_optimizations()` applies it only when the model capabilities allow it.
326
+ - Project integration: SD1.5 and SDXL advertise `supports_tome=True`; Flux2 advertises `False`.
327
+ - Effect: lower attention cost on supported UNet models, particularly at higher token counts.
328
+ - Benefits: explicitly capability-gated, integrated into the core optimization phase.
329
+ - Trade-offs: optional dependency, UNet-only in current practice, and quality can soften if pushed too aggressively.
330
+ - Evidence: `src/Model/ModelPatcher.py`, `src/Core/Pipeline.py`, capability declarations in `src/Core/Models/*`, `tests/unit/test_tome_fix.py`.
331
+
332
+ ### DeepCache
333
+
334
+ - Status: `integrated, optional, implementation-specific`
335
+ - Purpose: reuse work across denoising steps rather than running a full forward pass every time.
336
+ - Implementation in LightDiffusion-Next: `ApplyDeepCacheOnModel.patch()` clones the model and wraps its UNet function. On cache-update steps it runs the model normally and stores the output; on reuse steps it returns the cached output directly.
337
+ - Project integration: the main pipeline applies it from `_apply_optimizations()` when `deepcache_enabled` is true and the model advertises support.
338
+ - Effect: fewer full model computations on reuse steps, trading some fidelity for speed.
339
+ - Benefits: live integrated path, simple integration model, and capability gating.
340
+ - Trade-offs: the implementation works at whole-output reuse granularity rather than a finer-grained internal block reuse strategy, so its speed/fidelity profile is comparatively coarse.
341
+ - Evidence: `src/WaveSpeed/deepcache_nodes.py`, `src/Core/Pipeline.py`, `src/Core/AbstractModel.py`, `src/Core/Models/SD15Model.py`, `src/Core/Models/SDXLModel.py`, `tests/test_core_functionalities.py`.
342
+
343
+ ### First Block Cache for Flux
344
+
345
+ - Status: `codebase groundwork`
346
+ - Purpose: cache downstream transformer work when the first-block residual indicates the state has not changed much.
347
+ - Implementation in LightDiffusion-Next: `src/WaveSpeed/first_block_cache.py` contains cache contexts and patch builders for both UNet-like and Flux-like forward paths.
348
+ - Project integration: the module provides the machinery for a Flux-oriented first-block caching path. In the current project flow, the directly surfaced caching path is DeepCache, while this module remains groundwork for a more specialized integration.
349
+ - Effect: establishes the components needed for a transformer-oriented cache path in the codebase.
350
+ - Benefits: nontrivial implementation foundation already exists.
351
+ - Trade-offs: it is not yet surfaced as a broad standard option in the same way as the main integrated optimizations.
352
+ - Evidence: `src/WaveSpeed/first_block_cache.py`.
353
+
354
+ ## Memory Management And Serving Optimizations
355
+
356
+ ### Low-VRAM partial loading and offload policy
357
+
358
+ - Status: `integrated`
359
+ - Purpose: keep only the amount of model state in VRAM that current free memory allows, offloading the rest.
360
+ - Implementation in LightDiffusion-Next: `cond_util.prepare_sampling()` calls `Device.load_models_gpu(..., force_full_load=False)`; `Device.load_models_gpu()` computes low-VRAM budgets and delegates partial loading to `ModelPatcher.patch_model_lowvram()` and `partially_load()`.
361
+ - Project integration: this is a core loading behavior, not a side option. Text encoder and VAE also have explicit offload-device helpers.
362
+ - Effect: keeps generation viable on limited VRAM systems and reduces full reload pressure.
363
+ - Benefits: central to memory behavior in constrained environments, architecture-aware, and tied into checkpoint, text encoder, and VAE device policy.
364
+ - Trade-offs: more complex state management; partial loading can increase latency and complicate debugging.
365
+ - Evidence: `src/cond/cond_util.py`, `src/Device/Device.py`, `src/Model/ModelPatcher.py`.
366
+
367
+ ### Async transfer helpers and pinned checkpoint tensors
368
+
369
+ - Status: `integrated, infrastructure-level`
370
+ - Purpose: reduce CPU<->GPU transfer cost with asynchronous copies, streams, and pinned host memory.
371
+ - Implementation in LightDiffusion-Next: `Device.cast_to()` can issue transfers on offload streams; checkpoint tensors are pinned on CUDA loads in `util.load_torch_file()`; VAE encode/decode uses non-blocking transfers.
372
+ - Project integration: these mechanisms appear most clearly in checkpoint loading, model movement, and VAE data flow. Some parts act as general transfer infrastructure rather than as a single user-facing optimization toggle.
373
+ - Effect: faster host/device movement and less transfer-induced stalling in hot paths that actually use the helpers.
374
+ - Benefits: useful on CUDA systems, especially during model load and VAE stages.
375
+ - Trade-offs: integration is uneven; some helper functions look broader than their current call footprint.
376
+ - Evidence: `src/Device/Device.py`, `src/Utilities/util.py`, `src/AutoEncoders/VariationalAE.py`.
377
+
378
+ ### Request coalescing and queue batching
379
+
380
+ - Status: `integrated`
381
+ - Purpose: batch compatible API requests together so the backend does fewer larger pipeline invocations.
382
+ - Implementation in LightDiffusion-Next: `server.py::GenerationBuffer` groups pending requests by a signature that includes model, size, scheduler, sampler, steps, multiscale settings, and other batch-level properties.
383
+ - Project integration: the worker chooses the oldest eligible group, optionally waits for more arrivals, flattens per-request samples into one pipeline call, and later remaps saved results back to request futures.
384
+ - Effect: better throughput and GPU utilization for concurrent API use.
385
+ - Benefits: real server-level optimization, clearly implemented, includes observability-oriented logs.
386
+ - Trade-offs: requires careful grouping keys; incompatible request options fragment batching opportunities.
387
+ - Evidence: `server.py`.
388
+
389
+ ### Singleton policy, large-group chunking, and image-save guardrails
390
+
391
+ - Status: `integrated`
392
+ - Purpose: prevent batching from hurting latency for lone requests, and prevent oversized coalesced batches from exploding decode/save paths.
393
+ - Implementation in LightDiffusion-Next: `LD_BATCH_WAIT_SINGLETONS` controls whether singletons wait; `LD_MAX_IMAGES_PER_GROUP` and `ImageSaver.MAX_IMAGES_PER_SAVE` drive chunking; large groups are split into smaller sequential pipeline runs.
394
+ - Project integration: the server keeps the coalescing optimization from turning into pathological giant save/decode operations, and tests cover the chunking behavior.
395
+ - Effect: better tail latency for single requests and more stable handling of large batched workloads.
396
+ - Benefits: directly addresses operational failure modes in large batched workloads.
397
+ - Trade-offs: chunking reduces some batching benefits; many environment variables affect behavior.
398
+ - Evidence: `server.py`, `src/FileManaging/ImageSaver.py`, `tests/unit/test_generation_buffer_chunking.py`, `docs/quirks.md`.
399
+
400
+ ### Next-model prefetch
401
+
402
+ - Status: `integrated`
403
+ - Purpose: while one batch is running, read the next checkpoint into CPU RAM if the queued next batch needs a different model.
404
+ - Implementation in LightDiffusion-Next: `GenerationBuffer._look_ahead_and_prefetch()` resolves the next checkpoint, loads it via `util.load_torch_file()` on a background task, and stores it in `ModelCache` as a prefetched state dict.
405
+ - Project integration: the next load can reuse the prefetched state dict through `util.load_torch_file()` before the cache entry is cleared.
406
+ - Effect: overlaps some future checkpoint load cost with current generation work.
407
+ - Benefits: server-side latency hiding with minimal interface impact.
408
+ - Trade-offs: only helps when queued work is predictable; increases CPU RAM usage.
409
+ - Evidence: `server.py`, `src/Device/ModelCache.py`, `src/Utilities/util.py`.
410
+
411
+ ### Keep-models-loaded cache
412
+
413
+ - Status: `integrated`
414
+ - Purpose: keep recently used checkpoints and sampling models resident instead of cleaning them up after every request.
415
+ - Implementation in LightDiffusion-Next: `ModelCache` stores checkpoints, TAESD models, sampling models, and the keep-loaded policy; `server.py` temporarily applies the request's `keep_models_loaded` directive for a group.
416
+ - Project integration: when enabled, main models are retained and only auxiliary control models are cleaned up aggressively.
417
+ - Effect: lower warm-start cost between related generations and less repetitive reload churn.
418
+ - Benefits: simple end-user behavior for a meaningful latency/memory trade-off.
419
+ - Trade-offs: consumes more VRAM/RAM; can make memory pressure less predictable on multi-user servers.
420
+ - Evidence: `src/Device/ModelCache.py`, `server.py`.
421
+
422
+ ### In-memory PNG byte buffer
423
+
424
+ - Status: `integrated`
425
+ - Purpose: return API images from memory instead of reading them back from disk after save.
426
+ - Implementation in LightDiffusion-Next: `ImageSaver` can store encoded PNG bytes in `_image_bytes_buffer`; `server.py` first calls `pop_image_bytes()` when fulfilling request futures.
427
+ - Project integration: batched pipeline runs can still save images normally while the API path avoids a disk round-trip for the response payload.
428
+ - Effect: lower response latency and less unnecessary disk I/O for served images.
429
+ - Benefits: directly reduces response-path disk I/O in API-serving scenarios.
430
+ - Trade-offs: consumes temporary RAM; only helps when the buffer path is actually populated.
431
+ - Evidence: `src/FileManaging/ImageSaver.py`, `server.py`.
432
+
433
+ ### TAESD preview pacing and preview fidelity control
434
+
435
+ - Status: `integrated, conditional`
436
+ - Purpose: keep live previews useful without letting preview generation dominate sampling time.
437
+ - Implementation in LightDiffusion-Next: `SamplerCallback` caches preview settings, only triggers previews at a coarse interval, and runs preview work on a background thread; the server also applies per-request preview fidelity presets (`low`, `balanced`, `high`).
438
+ - Project integration: previews are generated only when previewing is enabled, and the preview cadence is adaptive to total step count.
439
+ - Effect: live feedback with bounded preview overhead.
440
+ - Benefits: explicit pacing, non-blocking thread model, request-level fidelity override.
441
+ - Trade-offs: still extra work during sampling; fidelity presets are intentionally coarse.
442
+ - Evidence: `src/sample/BaseSampler.py`, `src/AutoEncoders/taesd.py`, `server.py`, preview tests under `tests/e2e` and `tests/integration/api`.
443
+
444
+ ## Integration Notes
445
+
446
+ These notes highlight how several optimizations are currently integrated and used inside the project.
447
+
448
+ ### 1. Flux-oriented first block caching
449
+
450
+ - The codebase contains a dedicated `src/WaveSpeed/first_block_cache.py` module with cache contexts and patch builders for Flux-oriented paths.
451
+ - In the current optimization stack, the directly surfaced caching path is DeepCache, while First Block Cache remains implementation groundwork for a more specialized integration.
452
+ - This establishes the core components for a transformer-oriented cache path even though it is not yet surfaced as a primary standard option.
453
+
454
+ ### 2. DeepCache reuse granularity
455
+
456
+ - DeepCache is integrated through `src/WaveSpeed/deepcache_nodes.py` and is applied from the main pipeline when enabled.
457
+ - In this project, it works by reusing prior denoiser outputs on designated reuse steps.
458
+ - This yields a clear speed-fidelity profile based on output reuse rather than on finer-grained internal block caching.
459
+
460
+ ### 3. Conditioning batching control
461
+
462
+ - Conditioning batching is centered in `src/cond/cond.py::calc_cond_batch()`, where compatible condition chunks are packed and concatenated.
463
+ - The `batched_cfg` request field participates as request-side control metadata around this behavior.
464
+ - In operation, the batching outcome is therefore shaped mainly by the central conditioning logic rather than by a standalone external switch.
465
+
466
+ ### 4. GPU attention backend selection
467
+
468
+ - Attention backend selection is hardware- and build-aware, with the runtime choosing among SpargeAttn, SageAttention, xformers, and PyTorch SDPA based on capability checks.
469
+ - The exact backend used in practice therefore depends on the active GPU generation, dependencies, and runtime configuration.
470
+ - Backend acceleration is therefore largely automatic from the user perspective while remaining environment-specific in implementation.
471
+
472
+ ### 5. Prompt cache behavior
473
+
474
+ - Prompt caching is implemented as a global dict-backed cache keyed by prompt hash and CLIP identity.
475
+ - The cache prunes old entries once it exceeds its configured size threshold.
476
+ - In operation, it primarily benefits repeated-prompt workflows such as seed sweeps and prompt iteration.
477
+
478
+ ## Conclusion
479
+
480
+ LightDiffusion-Next uses a layered optimization strategy spanning runtime kernels, scheduling, guidance logic, precision and memory control, model patching, and server-side throughput management.
481
+
482
+ - The core operational stack is built around AYS scheduling, attention backend selection, conditioning batching, low-VRAM loading policy, prompt caching, VAE tuning, and request coalescing.
483
+ - Optional paths such as Stable-Fast, `torch.compile`, ToMe, DeepCache, multiscale sampling, and quantization extend that stack for specific hardware targets, model families, and workload profiles.
484
+ - The serving layer is a first-class component of the performance model, with batching, chunking, prefetching, keep-loaded caches, and in-memory responses contributing directly to end-to-end latency and throughput.
docs/index.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LightDiffusion-Next
2
+
3
+ LightDiffusion-Next is a refactored and performance-first Stable Diffusion stack that bundles a modern Streamlit UI, an optional Gradio web app, a batched FastAPI backend and highly tuned inference primitives such as Stable-Fast, SageAttention and WaveSpeed caching.
4
+
5
+ ## Why pick LightDiffusion-Next
6
+
7
+ LightDiffusion-Next is built to handle day-to-day generation workloads on consumer GPUs while still scaling up to multi-user servers.
8
+
9
+ - **Fast by default.** Stable-Fast compilation, SageAttention, SpargeAttn and WaveSpeed caching are wired in so you can hit top-tier it/s without manual patching.
10
+ - **Multiple front-doors.** Choose between the Streamlit control room, a Gradio web UI (great for Spaces) or the programmable FastAPI queue for integrations.
11
+ - **Feature complete.** Txt2Img, Img2Img, Flux pipelines, AutoHDR, TAESD previews, prompt enhancement through Ollama, multi-scale diffusion with presets, LoRA mixing and automatic detailing are all available out of the box.
12
+ - **Operations friendly.** Docker images, GPU-aware batched serving, model caching controls and observability endpoints make it easy to deploy and monitor.
13
+
14
+ ## What ships in the box
15
+
16
+ - 🚀 **Streamlined UI** with live previews, history, presets, interrupt/resume controls and automatic metadata tagging.
17
+ - 🧠 **Prompt toolkit** including reusable negative embeddings, multi-concept weighting, prompt enhancement and prompt history.
18
+ - 🧩 **Modular pipeline** that routes SD1.5, SDXL-inspired workflows and quantized Flux models through a single code path with per-sample overrides for HiresFix, ADetailer or Img2Img.
19
+ - 🛠️ **Production API** powered by FastAPI with smart request coalescing, telemetry endpoints and base64 image responses ready for bots or creative tooling.
20
+ - 📦 **Deployment artifacts** such as Dockerfiles, docker-compose, run scripts for Windows, configurable GPU architecture flags and optional Ollama/Stable-Fast builds.
21
+
22
+ ## Quick pathways
23
+
24
+ - [Installation](installation.md) — pick Docker, Windows batch or manual Python setup.
25
+ - [First run & UI tour](usage.md) — learn the Streamlit layout, generation controls and history tools.
26
+ - [Workflow playbook](examples.md) — step through Txt2Img, Flux, Img2Img and API recipes.
27
+ - [Performance optimizations](optimizations.md) — understand SageAttention, Stable-Fast, WaveSpeed caching and the new AYS scheduler for 2-5x speedup.
28
+ - [Align Your Steps](ays-scheduler.md) — learn about AYS scheduler and prompt caching for additional speedup.
29
+ - [Prompt Caching](prompt-caching.md) — deep dive into prompt attention caching mechanics and tuning.
30
+ - [Performance tuning](quirks.md) — squeeze out extra throughput or reduce VRAM usage.
31
+ - [Architecture](architecture.md) — understand how the UI, pipeline and server cooperate.
32
+ - [REST & automation](api.md) — integrate Discord bots, automations or other clients.
33
+
34
+ ## Supported environments at a glance
35
+
36
+ - NVIDIA GPUs with CUDA 12.x drivers. SageAttention and SpargeAttn availability is detected at runtime and depends on installed kernels, drivers and GPU compute capability; some kernels may be disabled on newer CUDA runtimes (for example CUDA 12+). RTX 50xx and newer cards may use SageAttention + Stable-Fast where supported.
37
+ - Windows 10/11, Ubuntu 22.04+ and containerized deployments via Docker with NVIDIA Container Toolkit.
38
+ - Optional CPU-only mode for experimentation (no Stable-Fast/SageAttention speed-ups).
39
+
40
+ ## Where to head next
41
+
42
+ - Start with [Installation](installation.md) to get your environment ready.
43
+ - Drop into the [Streamlit UI guide](usage.md) for a tour of generation features and presets.
44
+ - Explore [Architecture](architecture.md) when you are ready to customize or embed LightDiffusion-Next in larger systems.
docs/installation.md ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Installation & Setup
2
+
3
+ LightDiffusion-Next can run locally on Windows or Linux, inside Docker, or on cloud GPUs. This page walks you through the supported installation paths and the assets you must download before your first generation.
4
+
5
+ ## Hardware & software requirements
6
+
7
+ The project is tuned for NVIDIA GPUs and CUDA 12.x drivers, but it also supports AMD GPUs with ROCm and Apple Silicon with Metal Performance Shaders (MPS). See [ROCm and Metal/MPS Support](rocm-metal-support.md) for platform-specific installation instructions.
8
+
9
+ - **Operating system:** Windows 10/11, Ubuntu 22.04+, macOS 12.3+ (for Apple Silicon), or any distro supported by NVIDIA Container Toolkit.
10
+ - **Python:** 3.10.x. The run scripts create a virtual environment automatically.
11
+ - **GPU:**
12
+ - **NVIDIA:** Card with at least compute capability 8.0 (Ampere) for SageAttention/SpargeAttn. RTX 50 series (compute 12.0) runs with SageAttention + Stable-Fast.
13
+ - **AMD:** RDNA 2+ or CDNA architectures with ROCm 5.0+. See [ROCm Support](rocm-metal-support.md#rocm-support-amd-gpus).
14
+ - **Apple Silicon:** M1/M2/M3 series with macOS 12.3+. See [Metal/MPS Support](rocm-metal-support.md#metalmps-support-apple-silicon).
15
+ - **VRAM:** 6 GB minimum (12 GB recommended) for SD1.5 workflows. Flux quantized pipelines require 16 GB+ for comfortable batching.
16
+ - **Disk space:** ~15 GB for dependencies plus your checkpoints, LoRAs and flux assets.
17
+
18
+ ## Choose an installation path
19
+
20
+ - [Windows quick start](#windows-quick-start-runbat)
21
+ - [Linux or WSL2 manual setup](#linuxwsl2-manual-setup)
22
+ - [Containerized deployment](#docker-and-containers)
23
+ - [Headless server API](#running-only-the-fastapi-server)
24
+
25
+ ### Windows quick start (`run.bat`)
26
+
27
+ The root repository ships with a convenience script that handles environment creation, dependency installation via `uv`, GPU detection and launching the Streamlit UI.
28
+
29
+ 1. Install the latest [Python 3.10](https://www.python.org/downloads/release/python-3100/) build and ensure `python` is on your `PATH`.
30
+ 2. Install the [NVIDIA CUDA 12 runtime driver](https://developer.nvidia.com/cuda-downloads) that matches your GPU.
31
+ 3. Clone the repository and place your checkpoints in `include/checkpoints` (see [Model assets](#model-assets)).
32
+ 4. Double-click `run.bat` from a terminal. The script will:
33
+
34
+ - Create `.venv` (if it does not exist) and upgrade `pip`.
35
+ - Install `uv` for fast dependency resolution.
36
+ - Detect an NVIDIA GPU via `nvidia-smi` and install the matching PyTorch wheels.
37
+ - Install all requirements and start Streamlit at `http://localhost:8501`.
38
+
39
+ 5. When you are done, close the terminal to stop the UI. The virtual environment is reusable across runs.
40
+
41
+ > **Tip:** To launch the Gradio UI instead, activate `.venv` and run `python app.py`.
42
+
43
+ ### Linux/WSL2 manual setup
44
+
45
+ 1. Install system dependencies:
46
+
47
+ ```bash
48
+ sudo apt update && sudo apt install python3.10 python3.10-venv python3-pip build-essential git
49
+ ```
50
+
51
+ > If you plan to use **AutoHDR** (ICC-based color transforms), ensure Little CMS (lcms2) is installed so Pillow can build profile transforms. On Debian/Ubuntu:
52
+ ```bash
53
+ sudo apt-get install -y liblcms2-2 liblcms2-dev
54
+ pip install --upgrade --force-reinstall pillow
55
+ ```
56
+
57
+
58
+ 2. (Optional) Install the [NVIDIA CUDA 12 toolkit](https://developer.nvidia.com/cuda-toolkit-archive) so SageAttention/SpargeAttn can compile native extensions.
59
+ 3. Create and activate a virtual environment:
60
+
61
+ ```bash
62
+ python3 -m venv .venv
63
+ source .venv/bin/activate
64
+ pip install --upgrade pip uv
65
+ ```
66
+
67
+ 4. Install PyTorch and core dependencies:
68
+
69
+ ```bash
70
+ uv pip install --index-url https://download.pytorch.org/whl/cu128 torch torchvision "triton>=2.1.0"
71
+ uv pip install -r requirements.txt
72
+ ```
73
+
74
+ 5. Launch the Streamlit UI:
75
+
76
+ ```bash
77
+ streamlit run streamlit_app.py --server.address=0.0.0.0 --server.port=8501
78
+ ```
79
+
80
+ Use `python app.py` if you prefer the Gradio interface.
81
+
82
+ 6. Deactivate the environment with `deactivate` when finished.
83
+
84
+ ### Docker and containers
85
+
86
+ Use Docker when you want an immutable runtime with SageAttention, SpargeAttn and Stable-Fast prebuilt.
87
+
88
+ 1. Install [Docker Desktop](https://www.docker.com/products/docker-desktop/) or Docker Engine with the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
89
+ 2. Clone the repository and review `docker-compose.yml`. Adjust:
90
+
91
+ - `TORCH_CUDA_ARCH_LIST` if you only target a specific GPU architecture.
92
+ - `INSTALL_STABLE_FAST` and `INSTALL_OLLAMA` build arguments if you want Stable-Fast or the Ollama prompt enhancer baked into the image.
93
+ - Volume mounts for `output/` and the `include/*` directories where you store checkpoints, LoRAs, embeddings and YOLO detectors.
94
+
95
+ 3. Build and start the stack:
96
+
97
+ ```bash
98
+ docker-compose up --build
99
+ ```
100
+
101
+ Streamlit is exposed on `http://localhost:8501` by default; Gradio is mapped to port `7860` and can be enabled by setting `UI_FRAMEWORK=gradio`.
102
+
103
+ 4. To rebuild with a different GPU architecture or optional component:
104
+
105
+ ```bash
106
+ docker-compose build --build-arg TORCH_CUDA_ARCH_LIST="9.0" --build-arg INSTALL_STABLE_FAST=1
107
+ ```
108
+
109
+ ### Running only the FastAPI server
110
+
111
+ If you want to integrate LightDiffusion-Next into automation pipelines or Discord bots, run the backend without launching a UI.
112
+
113
+ 1. Follow any of the setup methods above.
114
+ 2. Run:
115
+
116
+ ```bash
117
+ uvicorn server:app --host 0.0.0.0 --port 7861
118
+ ```
119
+
120
+ 3. Use the [REST API reference](api.md) to submit generation jobs via `POST /api/generate` and inspect queue health via `GET /api/telemetry`.
121
+
122
+ ## Model assets
123
+
124
+ LightDiffusion-Next does not bundle model weights. Place your assets into the `include/` tree before you start generating.
125
+
126
+ - `include/checkpoints/` — SD1.5 style `.safetensors` checkpoints (e.g. Meina V10, DreamShaper). The default pipeline expects a file named `Meina V10 - baked VAE.safetensors` unless you override it.
127
+ - `include/vae/ae.safetensors` — Flux VAE (download from [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell)). Required for Flux mode.
128
+ - `include/loras/` — LoRA adapters loaded from the UI or CLI.
129
+ - `include/embeddings/` — Negative prompt embeddings such as `EasyNegative`, `badhandv4`.
130
+ - `include/yolos/` — YOLO detectors used by ADetailer (`person_yolov8m-seg.pt`, `face_yolov9c.pt`).
131
+ - `include/ESRGAN/` — RealESRGAN models leveraged by UltimateSDUpscale in Img2Img workflows.
132
+ - `include/sd1_tokenizer/` — Tokenizer files for SD1.x. The repository already includes the defaults.
133
+
134
+ Store generated outputs under `output/` (separated into Classic, Flux, Img2Img, HiresFix and ADetailer sub-folders). The folders are created automatically during the first run.
135
+
136
+ ## Optional accelerations
137
+
138
+ - **Stable-Fast** — 70% faster SD1.5 inference through UNet compilation. Set `INSTALL_STABLE_FAST=1` in Docker or pass `--stable-fast` in the CLI/UI to compile on demand. Compilation adds a one-time warm-up cost.
139
+ - **SageAttention** — INT8 attention kernels with 15% speedup and lower VRAM use. Built automatically in Docker images; on bare metal, clone [SageAttention](https://github.com/thu-ml/SageAttention) and run `pip install -e . --no-build-isolation` inside your environment.
140
+ - **SpargeAttn** — Sparse attention kernels with 40–60% speedup (compute 8.0–9.0 GPUs only). Build from [SpargeAttn](https://github.com/thu-ml/SpargeAttn) using `TORCH_CUDA_ARCH_LIST="8.9"` or similar.
141
+ - **Ollama prompt enhancer** — Install [Ollama](https://ollama.com/) and pull `qwen3:0.6b`. Set `PROMPT_ENHANCER_MODEL=qwen3:0.6b` before launching LightDiffusion-Next to enable the automatic prompt rewrite toggle.
142
+
143
+ ## Verify your installation
144
+
145
+ 1. Start the UI or FastAPI server.
146
+ 2. Watch the startup logs — the initialization progress bar runs the dependency download routine (`CheckAndDownload`) and loads the default checkpoint.
147
+ 3. Generate a 512×512 image with the default prompt. The status bar shows timing and the output appears in `output/Classic`.
148
+ 4. Confirm the telemetry endpoint is reachable:
149
+
150
+ ```bash
151
+ curl http://localhost:7861/health
152
+ curl http://localhost:7861/api/telemetry
153
+ ```
154
+
155
+ ## Updating or rebuilding
156
+
157
+ - Pull the latest Git changes and rerun `uv pip install -r requirements.txt` in the virtual environment.
158
+ - For Docker users, rebuild with `docker-compose build --no-cache` to pick up updates.
159
+ - If you upgraded your GPU driver or CUDA toolkit, delete `~/.cache/torch_extensions` to force SageAttention/SpargeAttn to recompile.
160
+
161
+ You are now ready to explore the [UI guide](usage.md) and start generating.
docs/optimizations.md ADDED
@@ -0,0 +1,262 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Performance Optimizations
2
+
3
+ LightDiffusion-Next achieves its industry-leading inference speed through a layered stack of training-free optimizations that can be selectively enabled based on your hardware and quality requirements. This page provides an overview of each acceleration technique and links to detailed guides.
4
+
5
+ For a detailed source-based report on what is implemented today, including server-side throughput optimizations and practical implementation notes, see the [Implemented Optimizations Report](implemented-optimizations-report.md).
6
+
7
+ ## Optimization Stack Overview
8
+
9
+ The pipeline orchestrates six primary acceleration paths:
10
+
11
+ | Technique | Type | Speedup | Quality Impact | Requirements |
12
+ |-----------|------|---------|----------------|---------------|
13
+ | [AYS Scheduler](#ays-scheduler) | Sampling schedule | ~2x | None/Better | All models |
14
+ | [Prompt Caching](#prompt-caching) | Embedding cache | 5-15% | None | All models |
15
+ | [SageAttention](#sageattention--spargeattn) | Attention kernel | Moderate | None | All CUDA GPUs |
16
+ | [SpargeAttn](#sageattention--spargeattn) | Sparse attention | Significant | Minimal | Compute 8.0-9.0 |
17
+ | [Stable-Fast](#stable-fast) | Graph compilation | Significant* | None | >8GB VRAM, batch jobs |
18
+ | [WaveSpeed](#wavespeed-caching) | Feature caching | High | Tunable | All models |
19
+
20
+ *Speedup depends heavily on batch size and generation count
21
+
22
+ These optimizations **work together** — enabling multiple techniques simultaneously can provide substantial cumulative speedup with tunable quality trade-offs.
23
+
24
+ ## Quick Comparison
25
+
26
+ ### AYS Scheduler
27
+
28
+ **What it does:** Uses research-backed optimal timestep distributions that allow equivalent quality in approximately half the steps. Instead of uniform sigma spacing, AYS concentrates samples on noise levels that contribute most to image formation.
29
+
30
+ **When to use:**
31
+ - Always recommended for SD1.5, SDXL, and Flux models
32
+ - Txt2Img generation
33
+ - Production workflows where speed matters
34
+ - Any scenario where you'd normally use 20+ steps
35
+
36
+ **Trade-offs:** Images will differ slightly from standard schedulers (different sampling path), but quality is equivalent or better. Not ideal when exact reproduction of old results is required.
37
+
38
+ [→ Full AYS Scheduler guide](ays-scheduler.md)
39
+
40
+ ---
41
+
42
+ ### Prompt Caching
43
+
44
+ **What it does:** Caches CLIP text embeddings for prompts that have been encoded before. When generating multiple images with the same or similar prompts, embeddings are retrieved from cache instead of being recomputed.
45
+
46
+ **When to use:**
47
+ - Batch generation with same prompt
48
+ - Testing different seeds or settings
49
+ - Iterative prompt refinement
50
+ - Any workflow with repeated prompts
51
+
52
+ **Trade-offs:** None — minimal memory overhead (~50-200MB), negligible CPU cost, automatically enabled by default.
53
+
54
+ [→ Full Prompt Caching guide](prompt-caching.md)
55
+
56
+ ---
57
+
58
+ ### SageAttention & SpargeAttn {#sageattention--spargeattn}
59
+
60
+ **What it does:** Replaces PyTorch's default scaled dot-product attention with highly optimized CUDA kernels. SageAttention uses INT8 quantization for key/value tensors while maintaining FP16 query precision. SpargeAttn extends this with dynamic sparsity pruning, skipping redundant attention computations.
61
+
62
+ **When to use:**
63
+ - Always enable SageAttention if available (no quality loss, pure speed gain)
64
+ - SpargeAttn for maximum speed on supported hardware (RTX 30xx/40xx, A100, H100)
65
+ - Both work seamlessly with all samplers, LoRAs and post-processing stages
66
+
67
+ **Trade-offs:** None for SageAttention. SpargeAttn may introduce subtle texture variations at very high sparsity thresholds (default is conservative).
68
+
69
+ [→ Full SageAttention/SpargeAttn guide](sageattention.md)
70
+
71
+ ---
72
+
73
+ ### CFG Samplers {#cfg-samplers}
74
+
75
+ CFG++ Samplers are advanced sampling algorithms that incorporate Classifier-Free Guidance directly into the sampling process, providing better quality and stability compared to standard CFG.
76
+
77
+ ---
78
+
79
+ ### Multi-Scale Diffusion {#multi-scale}
80
+
81
+ Multi-Scale Diffusion optimizes performance by processing images at multiple resolutions during generation, reducing computation for high-resolution areas.
82
+
83
+ **When to use:**
84
+ - High-resolution generation (>1024px)
85
+ - When memory is limited
86
+ - For faster previews
87
+
88
+ **Trade-offs:** May reduce detail in fine areas.
89
+
90
+ **Note:** In most cases, Multi-Scale Diffusion in quality mode gives better results than standard diffusion while giving a small speedup (this is explained by the upsampling process).
91
+
92
+ ---
93
+
94
+ ### Stable-Fast
95
+
96
+ **What it does:** JIT-compiles the UNet diffusion model into optimized TorchScript with optional CUDA graphs. The first forward pass traces execution, caches kernel launches and fuses operators for reduced overhead.
97
+
98
+ **When to use:**
99
+ - **Systems with >8GB VRAM** (preferably 12GB+)
100
+ - Batch jobs or workflows generating 50+ images with identical settings
101
+ - Long-running operations where 30-60s compilation amortizes over time
102
+ - Fixed resolutions and batch sizes
103
+
104
+ **When NOT to use:**
105
+ - Normal 20-step single image generation (compilation overhead > speedup gains)
106
+ - Systems with <8GB VRAM
107
+ - Flux workflows (different architecture)
108
+ - Quick prototyping or frequent model/resolution changes
109
+
110
+ **Trade-offs:** Compilation time on first run (30-60s), VRAM overhead (~500MB), reduced flexibility for dynamic shapes.
111
+
112
+ [→ Full Stable-Fast guide](stablefast.md)
113
+
114
+ ---
115
+
116
+ ### WaveSpeed Caching
117
+
118
+ **What it does:** Exploits temporal redundancy in diffusion processes by reusing work across denoising steps. In the current project stack this primarily means DeepCache on supported UNet models, with additional Flux-oriented cache groundwork present in the codebase.
119
+
120
+ 1. **DeepCache** — Reuses prior denoiser outputs on selected steps in UNet models (SD1.5, SDXL)
121
+ 2. **First Block Cache (FBCache)** — Flux-oriented cache machinery available for specialized integration work
122
+
123
+ **When to use:**
124
+ - Any workflow where you can tolerate slight smoothing in exchange for 2-3x speedup
125
+ - Combine with conservative cache intervals (2-3) for minimal quality loss
126
+ - Works alongside SageAttention and Stable-Fast
127
+
128
+ **Trade-offs:** Reduced fine detail if interval is too high, slight VRAM increase for cached tensors.
129
+
130
+ [→ Full WaveSpeed guide](wavespeed.md)
131
+
132
+ ---
133
+
134
+ ## Priority & Fallback System
135
+
136
+ LightDiffusion-Next automatically selects the best available attention backend at runtime:
137
+
138
+ ```
139
+ SpargeAttn > SageAttention > xformers > PyTorch SDPA
140
+ ```
141
+
142
+ If a kernel fails (e.g., unsupported head dimension), the system gracefully falls back to the next option. You can force PyTorch SDPA by setting `LD_DISABLE_SAGE_ATTENTION=1` for debugging.
143
+
144
+ Stable-Fast and WaveSpeed are opt-in toggles controlled via the UI or REST API.
145
+
146
+ ## Recommended Configurations
147
+
148
+ ### Maximum Speed - Batch Jobs (SD1.5, >8GB VRAM, 50+ images)
149
+ ```yaml
150
+ stable_fast: true # Only for batch operations
151
+ sageattention: auto # or spargeattn if available
152
+ deepCache:
153
+ enabled: true
154
+ interval: 3
155
+ depth: 2
156
+ ```
157
+ **Expected:** Maximum speedup for batch operations, some quality loss
158
+ **Note:** Disable stable_fast for single 20-step generations
159
+
160
+ ### Balanced - Quick Generation (SD1.5, any VRAM)
161
+ ```yaml
162
+ scheduler: ays # NEW: Use AYS for 2x speedup
163
+ steps: 10 # Reduced from 20 (same quality with AYS)
164
+ stable_fast: false # Disabled for normal generations
165
+ sageattention: auto
166
+ prompt_cache_enabled: true # Enabled by default
167
+ deepcache:
168
+ enabled: true
169
+ interval: 2
170
+ depth: 1
171
+ ```
172
+ **Expected:** ~2-3x speedup with minimal quality loss
173
+ **Note:** AYS scheduler provides the main speedup; enable stable_fast only for batch jobs (50+ images)
174
+
175
+ ### Quality-First (Flux)
176
+ ```yaml
177
+ scheduler: ays_flux # NEW: Optimized for Flux models
178
+ steps: 10 # Reduced from 15 (same quality with AYS)
179
+ stable_fast: false # not supported
180
+ sageattention: auto
181
+ prompt_cache_enabled: true
182
+ deepcache:
183
+ enabled: true
184
+ interval: 2
185
+ ```
186
+ **Expected:** ~2x speedup with minimal quality impact
187
+
188
+ ### Production API - High Volume (>8GB VRAM)
189
+ ```yaml
190
+ stable_fast: true # Only for sustained high-volume APIs
191
+ sageattention: auto
192
+ deepCache:
193
+ enabled: false # avoid variability across batch sizes
194
+ keep_models_loaded: true
195
+ ```
196
+ **Expected:** Consistent latency for repeated identical requests
197
+ **Note:** For low-volume or single-shot APIs, use `stable_fast: false`
198
+
199
+ ## Hardware-Specific Tips
200
+
201
+ ### RTX 30xx / 40xx (Ampere/Ada)
202
+ - Enable SpargeAttn for best results
203
+ - Stable-Fast only for batch jobs (disable for quick 20-step generations)
204
+ - Stable-Fast + SpargeAttn + DeepCache stacks well for long operations
205
+ - Watch VRAM — Stable-Fast graphs consume ~500MB
206
+
207
+ ### RTX 50xx (Blackwell)
208
+ - SageAttention only (SpargeAttn support pending)
209
+ - Stable-Fast works but recompiles for new CUDA arch
210
+ - DeepCache is your best additional speedup
211
+
212
+ ### A100 / H100 (Datacenter)
213
+ - SpargeAttn + Stable-Fast + aggressive WaveSpeed
214
+ - Prefer larger batch sizes to amortize kernel overhead
215
+ - Use CUDA graphs (`enable_cuda_graph=True` in Stable-Fast config)
216
+
217
+ ### Low VRAM (<8GB)
218
+ - **Always disable Stable-Fast** (requires >8GB VRAM)
219
+ - Use SageAttention (minimal overhead)
220
+ - Enable DeepCache with conservative intervals
221
+ - Set `vae_on_cpu=True` for HiRes workflows
222
+
223
+ ## Debugging & Profiling
224
+
225
+ Check which optimizations are active:
226
+
227
+ ```bash
228
+ # View startup logs
229
+ cat logs/server.log | grep -i "using\|enabled"
230
+
231
+ # Sample output:
232
+ # Using SpargeAttn (Sparse + SageAttention) cross attention
233
+ # Using SpargeAttn (Sparse + SageAttention) in VAE
234
+ # Stable-Fast compilation enabled
235
+ # DeepCache active: interval=3, depth=2
236
+ ```
237
+
238
+ Monitor telemetry:
239
+
240
+ ```bash
241
+ curl http://localhost:7861/api/telemetry | jq '.vram_usage_mb, .average_latency_ms'
242
+ ```
243
+
244
+ Disable individual optimizations to isolate issues:
245
+
246
+ ```bash
247
+ export LD_DISABLE_SAGE_ATTENTION=1 # Forces PyTorch SDPA
248
+ export LD_DISABLE_STABLE_FAST=1 # Skips compilation
249
+ export LD_DISABLE_WAVESPEED=1 # Disables all caching
250
+ ```
251
+
252
+ ## Further Reading
253
+ - [AYS Scheduler Deep Dive](ays-scheduler.md) — Theory, implementation, quality tuning
254
+ - [Prompt Caching Deep Dive](prompt-caching.md) — Implementation details, cache management, performance impact
255
+ - [SageAttention & SpargeAttn Deep Dive](sageattention.md) — Installation, technical details, head dimension handling
256
+ - [Stable-Fast Compilation Guide](stablefast.md) — Configuration, CUDA graphs, troubleshooting
257
+ - [WaveSpeed Caching Strategies](wavespeed.md) — DeepCache vs FBCache, tuning parameters, compatibility matrix
258
+ - [Performance Tuning](quirks.md) — VRAM management, slow first runs, recompilation fixes
259
+
260
+ ---
261
+
262
+ Armed with this overview, dive into the technique-specific guides or experiment directly in the UI to find your optimal speed/quality balance.
docs/prompt-caching.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Prompt Attention Caching
2
+
3
+ ### What It Does
4
+
5
+ Caches CLIP text embeddings for prompts you've already encoded. When you reuse a prompt (or parts of it), the embedding is retrieved from cache instead of being recomputed.
6
+
7
+ ### When It Helps Most
8
+
9
+ - Batch generation with same prompt
10
+ - Testing different seeds
11
+ - Incremental prompt refinement
12
+ - Generation sessions with repeated themes
13
+
14
+ ### Configuration
15
+
16
+ **Enable/Disable** (default: enabled):
17
+ ```python
18
+ from src.Utilities import prompt_cache
19
+
20
+ # Enable (default)
21
+ prompt_cache.enable_prompt_cache(True)
22
+
23
+ # Disable
24
+ prompt_cache.enable_prompt_cache(False)
25
+
26
+ # Check status
27
+ stats = prompt_cache.get_cache_stats()
28
+ print(f"Hit rate: {stats['hit_rate']:.1%}")
29
+ ```
30
+
31
+ **Cache Settings**:
32
+ - Maximum entries: 256 prompts before pruning
33
+ - Cache structure: global dict keyed by prompt hash and CLIP identity
34
+ - Memory usage: workload-dependent, estimated from cached embedding tensors
35
+ - Cache cleared on: restart, disable, or manual clear
36
+ - Automatic pruning: removes the oldest 25% of entries when the cache exceeds its limit
37
+
38
+ ### Viewing Cache Stats
39
+
40
+ ```python
41
+ from src.Utilities import prompt_cache
42
+
43
+ # Print statistics
44
+ prompt_cache.print_cache_stats()
45
+
46
+ # Output:
47
+ # ============================================================
48
+ # Prompt Cache Statistics
49
+ # ============================================================
50
+ # Status: Enabled
51
+ # Entries: 42
52
+ # Size: ~85.3 MB
53
+ # Requests: 150 (hits: 108, misses: 42)
54
+ # Hit Rate: 72.0%
55
+ # ============================================================
56
+ ```
57
+
58
+ ### Best Practices
59
+
60
+ 1. **Leave it enabled** - negligible overhead, significant gains
61
+ 2. **Monitor hit rate** - should be >50% in typical workflows
62
+ 3. **Clear cache** when switching models or major prompt changes
63
+ 4. **Batch similar prompts** to maximize cache hits
64
+ 5. **Expect global behavior** because the cache is shared across repeated prompt encodes rather than being scoped to a single generation session
docs/quirks.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Quirks & Troubleshooting
2
+
3
+ This playbook highlights the most common operational quirks you may encounter while running LightDiffusion-Next and the quickest ways to resolve them.
4
+
5
+ ## GPU memory headaches
6
+
7
+ | Symptom | Likely cause | Quick fixes |
8
+ | --- | --- | --- |
9
+ | `CUDA out of memory` during base diffusion | Resolution or batch too high | Drop to 512×512 or smaller, decrease batch to 1, disable HiresFix or AutoDetailer, prefer Euler/Karras samplers in **CFG++** mode |
10
+ | OOM triggered mid-way through HiRes | VRAM spikes when loading VAE/second UNet | Enable **Keep models loaded** (to avoid reloading) or run HiRes on CPU by toggling *VAE on CPU* in settings |
11
+ | Flux runs crash immediately | Missing Flux decoder or running on <16 GB VRAM | Place Flux weights in `include/Flux`, disable Flux or use SD1.5 profile on smaller cards |
12
+
13
+ Additional tips:
14
+
15
+ - Enable **VRAM budget** in Streamlit to see live usage (requires `LD_SHOW_VRAM=1`).
16
+ - In Docker, pass `--gpus all` and ensure `NVIDIA_VISIBLE_DEVICES` is not empty.
17
+ - Clear `~/.cache/torch_extensions` if Stable-Fast kernels were compiled against an older driver and now fail to load.
18
+
19
+ ## Slow first runs or repeated recompilation
20
+
21
+ - Stable-Fast and SageAttention compile custom kernels on first use. This can take several minutes. Once complete, the compiled artifacts live under `~/.cache/torch_extensions` (host) or `/root/.cache/torch_extensions` (Docker). Mount this directory as a volume for faster cold starts.
22
+ - If Streamlit re-compiles every launch, ensure the container or user has write access to the cache directory and that the system clock is correct.
23
+ - Set `LD_DISABLE_SAGE_ATTENTION=1` to isolate issues related specifically to SageAttention.
24
+
25
+ ## Downloader complaints about missing assets
26
+
27
+ - The startup checks look for standard filenames (e.g., `yolov8n.pt`, `taesdxl_decoder.safetensors`). Verify these live under the correct subdirectories in `include/`.
28
+ - For offline setups, drop the files manually and create empty `.ok` sentinels (e.g., `include/checkpoints/.downloads-ok`) to skip prompts.
29
+ - Hugging Face rate limits manifest as HTTP 429. Provide a token via the prompt, set `HF_TOKEN` in the environment or download manually.
30
+
31
+ ## Streamlit UI quirks
32
+
33
+ - **Preview stuck on “Waiting for GPU”** – Check FastAPI logs; the batching worker may be paused. Restart the Streamlit session or run `python server.py` to inspect queue telemetry.
34
+ - **Settings reset on restart** – Ensure the process can write to `webui_settings.json`. Remove the file to revert to defaults if it becomes corrupted.
35
+ - **History thumbnails missing** – Delete the entry under `ui/history/<timestamp>`; the next render will recreate previews.
36
+
37
+ ## Gradio or API automation issues
38
+
39
+ - `/api/generate` returns 500 with “No images produced”: inspect server logs for `Pipeline import error` or missing models. Ensure `pipeline.py` is importable and the working directory is the repository root.
40
+ - Jobs appear stuck: call `/api/telemetry` to inspect `pending_by_signature`. Mixed resolutions or toggles prevent batching; if running single job automation, set `LD_BATCH_WAIT_SINGLETONS=0` to avoid coalescing delays.
41
+ - SaveImage aborts with "Attempting to save N images in a single call" (exceeds `MAX_IMAGES_PER_SAVE`): this usually indicates tiled intermediate outputs or a very large batched tensor. The server will chunk large coalesced groups into smaller runs of at most `LD_MAX_IMAGES_PER_GROUP` images (default: 256) to mitigate this. If you must allow larger single-call saves, set `LD_MAX_IMAGES_PER_SAVE` to a higher value in the server environment (e.g., `export LD_MAX_IMAGES_PER_SAVE=256`) but be mindful of disk usage. Alternatively, reduce `num_images` per job or lower `LD_MAX_BATCH_SIZE` to keep groups smaller.
42
+ - Health checks: `/health` returns `{ "status": "ok" }`. If it fails, the FastAPI app likely crashed—restart and inspect `logs/server.log`.
43
+
44
+ ## Docker-specific notes
45
+
46
+ - Always build with the provided `Dockerfile` to get SageAttention patches precompiled.
47
+ - Forward model assets by mounting `./include` into the container (`-v $(pwd)/include:/app/include`).
48
+ - On Windows + WSL2, ensure the WSL distro has the NVIDIA driver bridge (`wsl --status`).
49
+
50
+ ## Logging & diagnostics
51
+
52
+ - Server logs live under `logs/server.log` with per-request IDs. Tail them during load testing: `tail -f logs/server.log`.
53
+ - Enable debug logging by exporting `LD_SERVER_LOGLEVEL=DEBUG` before launching Streamlit/Gradio/uvicorn.
54
+ - To inspect queue depth without hitting the API, watch the `GenerationBuffer` logs; each batch prints signature summaries.
55
+
56
+ ## When all else fails
57
+
58
+ - Clear the `include/last_seed.txt` file if seed reuse behaves unexpectedly.
59
+ - Regenerate Stable-Fast kernels by deleting the cache directory and re-running with `stable_fast` enabled.
60
+ - Collect the following before opening an issue: GPU model, driver version, operating system, a copy of `logs/server.log`, hardware info from `/api/telemetry`, and reproduction steps.
docs/rocm-metal-support.md ADDED
@@ -0,0 +1,360 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ROCm and Metal/MPS Support
2
+
3
+ LightDiffusion-Next includes comprehensive support for AMD GPUs with ROCm and Apple Silicon Macs with Metal Performance Shaders (MPS). This guide covers the platform-specific considerations and optimizations available for non-NVIDIA hardware.
4
+
5
+ ## ROCm Support (AMD GPUs)
6
+
7
+ ### Overview
8
+
9
+ ROCm (Radeon Open Compute) is AMD's open-source platform for GPU computing. LightDiffusion-Next automatically detects and utilizes ROCm-compatible AMD GPUs through PyTorch's HIP backend.
10
+
11
+ ### Supported Hardware
12
+
13
+ - **RDNA Architecture:**
14
+
15
+ - RDNA 2 (RX 6000 series) - FP16 support
16
+ - RDNA 3 (RX 7000 series) - FP16 and BF16 support
17
+
18
+ - **CDNA Architecture:**
19
+
20
+ - CDNA (MI100)
21
+ - CDNA 2 (MI200 series) - FP16 and BF16 support
22
+ - CDNA 3 (MI300 series) - FP16 and BF16 support
23
+
24
+ ### Installation
25
+
26
+ 1. **Install ROCm drivers and runtime:**
27
+
28
+ Follow the official [ROCm installation guide](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html) for your Linux distribution.
29
+
30
+ ```bash
31
+ # Example for Ubuntu 22.04
32
+ wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_latest_all.deb
33
+ sudo apt-get install ./amdgpu-install_latest_all.deb
34
+ sudo amdgpu-install --usecase=rocm
35
+ ```
36
+
37
+ 2. **Verify ROCm installation:**
38
+
39
+ ```bash
40
+ rocm-smi
41
+ /opt/rocm/bin/rocminfo
42
+ ```
43
+
44
+ 3. **Install PyTorch with ROCm support:**
45
+
46
+ ```bash
47
+ pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm6.0
48
+ ```bash
49
+ # Create virtual environment
50
+ python3 -m venv .venv
51
+ source .venv/bin/activate
52
+ pip install --upgrade pip uv
53
+
54
+ # Install PyTorch with ROCm 6.0 support (adjust version as needed)
55
+ uv pip install --index-url https://download.pytorch.org/whl/rocm6.0 torch torchvision
56
+
57
+ # Install project dependencies
58
+ uv pip install -r requirements.txt
59
+ ```
60
+
61
+ 4. **Launch LightDiffusion-Next:**
62
+
63
+ ```bash
64
+ streamlit run streamlit_app.py --server.address=0.0.0.0 --server.port=8501
65
+ ```
66
+
67
+ ### ROCm-Specific Features
68
+
69
+ #### Automatic Detection
70
+
71
+ LightDiffusion-Next automatically detects ROCm GPUs at startup and reports them in the logs:
72
+
73
+ ```
74
+ Device: cuda:0 AMD Radeon RX 7900 XTX (ROCm) :
75
+ ```
76
+
77
+ #### Memory Management
78
+
79
+ - **Cache Management:** ROCm uses a more conservative cache clearing strategy compared to CUDA. Cache is only cleared when explicitly forced to prevent memory fragmentation issues.
80
+ - **Memory Statistics:** Full memory statistics are available through the standard PyTorch CUDA API (which works transparently with ROCm).
81
+
82
+ #### Precision Support
83
+
84
+ - **FP16:** Fully supported on all RDNA and CDNA architectures
85
+ - **BF16:** Supported on RDNA 3+ and CDNA 2+ GPUs (automatically detected)
86
+ - **FP32:** Always available as fallback
87
+
88
+ #### Attention Mechanisms
89
+
90
+ | Feature | ROCm Support | Notes |
91
+ |---------|--------------|-------|
92
+ | PyTorch Scaled Dot-Product Attention (SDPA) | ✅ Yes | Default and recommended |
93
+ | PyTorch Flash Attention | ✅ Yes | Available on RDNA 3 and CDNA 2+ |
94
+ | xformers | ✅ Yes | Works with ROCm builds of xformers |
95
+ | SageAttention | ❌ No | CUDA-only kernels |
96
+ | SpargeAttn | ❌ No | CUDA-only kernels |
97
+
98
+ **Recommendation:** Use PyTorch's built-in attention (SDPA) on ROCm for best compatibility. Install xformers ROCm build for additional optimizations.
99
+
100
+ ### Performance Tips
101
+
102
+ 1. **Use BF16 on supported GPUs:**
103
+
104
+ - RDNA 3 (RX 7000 series) and CDNA 2+ support BF16 natively
105
+ - BF16 provides better numerical stability than FP16
106
+
107
+ 2. **Enable PyTorch attention:**
108
+
109
+ - Automatically enabled for PyTorch 2.0+
110
+ - Provides good performance without CUDA-specific optimizations
111
+
112
+ 3. **Install ROCm-compatible xformers:**
113
+
114
+ ```bash
115
+ # Build xformers from source for ROCm
116
+ git clone https://github.com/facebookresearch/xformers.git
117
+ cd xformers
118
+ git submodule update --init --recursive
119
+ pip install -e . --no-build-isolation
120
+ ```
121
+
122
+ 4. **Monitor GPU utilization:**
123
+
124
+ ```bash
125
+ watch -n 1 rocm-smi
126
+ ```
127
+
128
+ ### Known Limitations
129
+
130
+ - **SageAttention and SpargeAttn:** These optimizations use CUDA-specific kernels and are not available on ROCm. The system automatically falls back to PyTorch SDPA.
131
+ - **Stable-Fast:** May have limited support depending on ROCm version. Test compilation before relying on it.
132
+ - **Driver Maturity:** Ensure you're using the latest ROCm version for best stability and performance.
133
+
134
+ ---
135
+
136
+ ## Metal/MPS Support (Apple Silicon)
137
+
138
+ ### Overview
139
+
140
+ Metal Performance Shaders (MPS) provides GPU acceleration on Apple Silicon Macs (M1, M2, M3 series). LightDiffusion-Next automatically detects and utilizes MPS when running on macOS.
141
+
142
+ ### Supported Hardware
143
+
144
+ - **Apple Silicon:**
145
+
146
+ - M1, M1 Pro, M1 Max, M1 Ultra
147
+ - M2, M2 Pro, M2 Max, M2 Ultra
148
+ - M3, M3 Pro, M3 Max
149
+ - All future M-series chips
150
+
151
+ ### Installation
152
+
153
+ 1. **Ensure macOS is up to date:**
154
+
155
+ - macOS 12.3 (Monterey) or later required
156
+ - macOS 13+ (Ventura) recommended for best performance
157
+
158
+ 2. **Install Python 3.10:**
159
+
160
+ ```bash
161
+ # Using Homebrew
162
+ brew install python@3.10
163
+ ```
164
+
165
+ 3. **Create virtual environment and install dependencies:**
166
+
167
+ ```bash
168
+ python3.10 -m venv .venv
169
+ source .venv/bin/activate
170
+ pip install --upgrade pip
171
+
172
+ # Install PyTorch with MPS support
173
+ pip install torch torchvision torchaudio
174
+
175
+ # Install project dependencies
176
+ pip install -r requirements.txt
177
+ ```
178
+
179
+ 4. **Launch LightDiffusion-Next:**
180
+
181
+ ```bash
182
+ streamlit run streamlit_app.py --server.address=0.0.0.0 --server.port=8501
183
+ ```
184
+
185
+ ### MPS-Specific Features
186
+
187
+ #### Automatic Detection
188
+
189
+ MPS is automatically detected and enabled on compatible hardware:
190
+
191
+ ```
192
+ Device: mps
193
+ VAE dtype: torch.float16
194
+ Set vram state to: SHARED
195
+ ```
196
+
197
+ #### Memory Management
198
+
199
+ - **Unified Memory:** Apple Silicon uses unified memory shared between CPU and GPU
200
+ - **VRAM State:** Automatically set to `SHARED` mode
201
+ - **Cache Management:** Uses `torch.mps.empty_cache()` for memory cleanup
202
+
203
+ #### Precision Support
204
+
205
+ - **FP16:** Fully supported and recommended (default)
206
+ - **FP32:** Supported but slower
207
+ - **BF16:** Not supported on MPS backend
208
+
209
+ #### Attention Mechanisms
210
+
211
+ | Feature | MPS Support | Notes |
212
+ |---------|-------------|-------|
213
+ | PyTorch Scaled Dot-Product Attention (SDPA) | ✅ Yes | Default and recommended |
214
+ | PyTorch Flash Attention | ❌ No | Not available on MPS |
215
+ | xformers | ❌ No | MPS backend not supported |
216
+ | SageAttention | ❌ No | CUDA/MPS incompatible |
217
+ | SpargeAttn | ❌ No | CUDA-only kernels |
218
+
219
+ **Recommendation:** Use PyTorch's built-in attention (SDPA) on MPS. It's well-optimized for Apple Silicon.
220
+
221
+ ### Performance Tips
222
+
223
+ - **Use FP16 precision:**
224
+
225
+ MPS works best with FP16
226
+ Automatically enabled by LightDiffusion-Next
227
+
228
+ - **Optimize batch sizes:**
229
+
230
+ Start with smaller batch sizes and increase gradually
231
+ Monitor memory usage through Activity Monitor
232
+
233
+ - **Keep macOS updated:**
234
+
235
+ Apple regularly improves MPS performance in system updates
236
+
237
+ - **Close unnecessary applications:**
238
+
239
+ Unified memory is shared with system processes
240
+ Free up RAM for better GPU performance
241
+
242
+ - **Monitor GPU usage:**
243
+
244
+ ```bash
245
+ # Use Activity Monitor -> GPU tab
246
+ # Or use powermetrics (requires sudo):
247
+ sudo powermetrics --samplers gpu_power -i 1000
248
+ ```
249
+
250
+ ### Known Limitations
251
+
252
+ - **Non-blocking transfers:** Not supported; MPS operations are blocking
253
+ - **Advanced optimizations:** SageAttention, SpargeAttn, and xformers are not available
254
+ - **BF16:** Not supported on MPS backend
255
+ - **Memory pressure:** System may swap under high memory load due to unified architecture
256
+
257
+ ### Unified Memory Considerations
258
+
259
+ Apple Silicon's unified memory architecture means:
260
+
261
+ - GPU and CPU share the same physical memory pool
262
+ - Less memory copying between devices
263
+ - System processes compete for the same memory
264
+ - Available VRAM depends on total system RAM and current usage
265
+
266
+ **Recommended RAM:**
267
+
268
+ - 16 GB: SD1.5 models at moderate resolutions
269
+ - 32 GB: Comfortable for most workflows including Flux (with quantization)
270
+ - 64 GB+: Professional workflows with large batch sizes
271
+
272
+ ---
273
+
274
+ ## Comparison Table
275
+
276
+ | Feature | NVIDIA (CUDA) | AMD (ROCm) | Apple (MPS) |
277
+ |---------|---------------|------------|-------------|
278
+ | FP16 | ✅ Full | ✅ Full | ✅ Full |
279
+ | BF16 | ✅ Full | ✅ RDNA3+/CDNA2+ | ❌ No |
280
+ | PyTorch SDPA | ✅ Yes | ✅ Yes | ✅ Yes |
281
+ | Flash Attention | ✅ Yes | ✅ RDNA3+/CDNA2+ | ❌ No |
282
+ | xformers | ✅ Yes | ✅ Build from source | ❌ No |
283
+ | SageAttention | ✅ Yes | ❌ No | ❌ No |
284
+ | SpargeAttn | ✅ Yes (CC 8.0-9.0) | ❌ No | ❌ No |
285
+ | Stable-Fast | ✅ Yes | ⚠️ Limited | ❌ No |
286
+ | Memory Management | ✅ Dedicated VRAM | ✅ Dedicated VRAM | ⚠️ Unified Memory |
287
+
288
+ ---
289
+
290
+ ## Troubleshooting
291
+
292
+ ### ROCm Issues
293
+
294
+ **Problem:** PyTorch doesn't detect ROCm GPU
295
+
296
+ ```bash
297
+ # Check ROCm installation
298
+ rocm-smi
299
+ rocminfo | grep "Name:"
300
+
301
+ # Verify PyTorch sees GPU
302
+ python -c "import torch; print(torch.cuda.is_available()); print(torch.version.hip)"
303
+ ```
304
+
305
+ **Problem:** Out of memory errors
306
+
307
+ - Reduce batch size
308
+ - Enable lower VRAM mode in settings
309
+ - Close other GPU-using applications
310
+ - Check with `rocm-smi` for memory usage
311
+
312
+ **Problem:** Slow performance
313
+
314
+ - Verify you're using the correct ROCm-optimized PyTorch build
315
+ - Check GPU utilization with `rocm-smi`
316
+ - Ensure FP16 or BF16 is enabled (check logs)
317
+
318
+ ### MPS Issues
319
+
320
+ **Problem:** MPS not detected
321
+
322
+ ```bash
323
+ # Verify MPS support
324
+ python -c "import torch; print(torch.backends.mps.is_available())"
325
+ ```
326
+ - Ensure macOS 12.3+
327
+ - Update to latest macOS version
328
+ - Reinstall PyTorch
329
+
330
+ **Problem:** Memory warnings or crashes
331
+
332
+ - Reduce batch size
333
+ - Close other applications to free unified memory
334
+ - Check Activity Monitor for memory pressure
335
+
336
+ **Problem:** Slower than expected performance
337
+
338
+ - Verify FP16 is being used (check logs)
339
+ - Close background applications
340
+ - Update to latest macOS version for performance improvements
341
+ - Some models may be CPU-bound on older M1 chips
342
+
343
+ ---
344
+
345
+ ## Getting Help
346
+
347
+ For platform-specific issues:
348
+
349
+ 1. Check the [FAQ](faq.md) for common questions
350
+ 2. Review PyTorch's platform-specific documentation:
351
+ - [ROCm installation](https://pytorch.org/get-started/locally/#linux-rocm)
352
+ - [MPS backend](https://pytorch.org/docs/stable/notes/mps.html)
353
+ 3. Open an issue on GitHub with:
354
+ - Platform details (GPU model, driver version, OS)
355
+ - LightDiffusion-Next startup logs
356
+ - Output of `python -c "import torch; print(torch.__version__); print(torch.version.hip if hasattr(torch.version, 'hip') else 'CUDA'); print(torch.cuda.is_available())"`
357
+
358
+ ---
359
+
360
+ **Note:** This documentation reflects the current state of ROCm and MPS support in PyTorch and LightDiffusion-Next. As these platforms mature, more optimizations and features may become available.
docs/sageattention.md ADDED
@@ -0,0 +1,338 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SageAttention & SpargeAttn
2
+
3
+ ## Overview
4
+
5
+ SageAttention and SpargeAttn are drop-in replacements for PyTorch's scaled dot-product attention that can provide significant speedup with zero to minimal quality loss. They work by optimizing the compute-heavy attention mechanism used throughout diffusion models (UNet, VAE, Flux Transformers).
6
+
7
+ - **SageAttention**: Uses INT8 quantization for key/value tensors while maintaining FP16 query precision
8
+ - **SpargeAttn**: Adds dynamic sparsity pruning on top of SageAttention, skipping redundant attention computations
9
+
10
+ Both are **training-free**, **hardware-accelerated** CUDA kernels that integrate transparently into LightDiffusion-Next.
11
+
12
+ ## How It Works
13
+
14
+ ### SageAttention
15
+
16
+ Standard attention computes:
17
+
18
+ $$
19
+ \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V
20
+ $$
21
+
22
+ SageAttention accelerates this by:
23
+
24
+ 1. **Quantizing K and V** to INT8 before the matrix multiplication
25
+ 2. **Keeping Q in FP16** to preserve attention score precision
26
+ 3. **Fusing operations** (softmax, scaling, matmul) in hand-tuned CUDA kernels
27
+ 4. **Dequantizing** output back to FP16 after final matmul
28
+
29
+ This reduces memory bandwidth (K/V use half the space) and leverages Tensor Cores more efficiently.
30
+
31
+ ### SpargeAttn
32
+
33
+ SpargeAttn extends SageAttention with **sparse attention masking**:
34
+
35
+ 1. Computes a similarity metric between query and key patches
36
+ 2. Prunes attention connections below a learned threshold (default: 60% similarity)
37
+ 3. Applies cumulative distribution filtering to keep only the top 97% of attention scores
38
+ 4. Uses partial vector thresholding to skip redundant computations
39
+
40
+ The result: 40-60% total speedup over baseline PyTorch attention with minimal impact on output quality.
41
+
42
+ ## Installation
43
+
44
+ ### SageAttention (All Platforms)
45
+
46
+ **Prerequisites:**
47
+ - CUDA Toolkit 11.8+ (must match your PyTorch CUDA version)
48
+ - Python 3.8+
49
+ - PyTorch with CUDA support
50
+
51
+ **Install:**
52
+
53
+ ```bash
54
+ # Clone repository
55
+ git clone https://github.com/thu-ml/SageAttention
56
+ cd SageAttention
57
+
58
+ # Install from source (no build isolation to respect existing CUDA setup)
59
+ pip install -e . --no-build-isolation
60
+
61
+ # Verify installation
62
+ python -c "import sageattention; print('SageAttention installed successfully')"
63
+ ```
64
+
65
+ ### SpargeAttn (Linux/WSL2 Only)
66
+
67
+ **Prerequisites:**
68
+ - Same as SageAttention
69
+ - Linux or WSL2 environment (Windows native builds fail due to linker path limits)
70
+ - GPU with compute capability 8.0-9.0 (RTX 30xx, 40xx, A100, H100)
71
+
72
+ **Install:**
73
+
74
+ ```bash
75
+ # Clone repository
76
+ git clone https://github.com/thu-ml/SparseAttention
77
+ cd SpargeAttn
78
+
79
+ # Set GPU architecture (critical for performance)
80
+ export TORCH_CUDA_ARCH_LIST="9.0" # Or your GPU: 8.0, 8.6, 8.9, 9.0
81
+
82
+ # Install from source
83
+ pip install -e . --no-build-isolation
84
+
85
+ # Verify installation
86
+ python -c "import spas_sage_attn; print('SpargeAttn installed successfully')"
87
+ ```
88
+
89
+ **GPU Architecture Reference:**
90
+
91
+ | GPU Model | Compute Capability | TORCH_CUDA_ARCH_LIST |
92
+ |-----------|-------------------|----------------------|
93
+ | RTX 3060/3070/3080/3090 | 8.6 | `"8.6"` |
94
+ | RTX 4060/4070/4080/4090 | 8.9 | `"8.9"` |
95
+ | A100 | 8.0 | `"8.0"` |
96
+ | H100 | 9.0 | `"9.0"` |
97
+ | RTX 5060/5070/5080/5090 | 12.0 | SageAttention supported, SpargeAttn pending |
98
+
99
+ ### Docker Installation
100
+
101
+ Both kernels are automatically built during the Docker image creation if the architecture is supported:
102
+
103
+ ```bash
104
+ # Build with SpargeAttn (compute 8.0-9.0)
105
+ docker-compose build --build-arg TORCH_CUDA_ARCH_LIST="8.9"
106
+
107
+ # RTX 50xx builds (SageAttention only, no SpargeAttn yet)
108
+ docker-compose build --build-arg TORCH_CUDA_ARCH_LIST="12.0"
109
+ ```
110
+
111
+ ## Usage
112
+
113
+ ### Automatic Detection
114
+
115
+ LightDiffusion-Next automatically detects and enables the best available attention backend at startup:
116
+
117
+ ```python
118
+ # Priority order (highest to lowest):
119
+ SpargeAttn > SageAttention > xformers > PyTorch SDPA
120
+ ```
121
+
122
+ Check which backend is active in the server logs:
123
+
124
+ ```bash
125
+ # SpargeAttn enabled
126
+ cat logs/server.log | grep "attention"
127
+ # Output: Using SpargeAttn (Sparse + SageAttention) cross attention
128
+
129
+ # SageAttention enabled
130
+ # Output: Using SageAttention cross attention
131
+
132
+ # Fallback
133
+ # Output: Using pytorch cross attention
134
+ ```
135
+
136
+ ### Streamlit UI
137
+
138
+ No configuration needed — SageAttention/SpargeAttn are always active if installed.
139
+
140
+ ### REST API
141
+
142
+ Same as UI — the backend selection is transparent:
143
+
144
+ ```bash
145
+ curl -X POST http://localhost:7861/api/generate \
146
+ -H "Content-Type: application/json" \
147
+ -d '{
148
+ "prompt": "a serene mountain lake at dawn",
149
+ "width": 768,
150
+ "height": 512,
151
+ "num_images": 1
152
+ }'
153
+ # Automatically uses SpargeAttn if available
154
+ ```
155
+
156
+ ### Manual Disable
157
+
158
+ Force PyTorch SDPA for debugging:
159
+
160
+ ```bash
161
+ export LD_DISABLE_SAGE_ATTENTION=1
162
+ python streamlit_app.py
163
+ ```
164
+
165
+ ## Performance
166
+
167
+ Both SageAttention and SpargeAttn provide measurable speedup over PyTorch SDPA baseline:
168
+
169
+ - **SageAttention**: Moderate speedup with zero quality loss (reported ~15-20% in papers)
170
+ - **SpargeAttn**: Significant speedup with minimal quality loss (reported ~40-60% in papers)
171
+
172
+ Actual performance gains vary based on:
173
+ - GPU architecture and VRAM
174
+ - Model type (SD1.5, SDXL, Flux)
175
+ - Resolution and batch size
176
+ - Head dimensions and sequence lengths
177
+
178
+ **Note:** Benchmark your specific setup to measure real-world performance.## Technical Details
179
+
180
+ ### Head Dimension Support
181
+
182
+ Both kernels natively support head dimensions of `[64, 96, 128]`. For other dimensions:
183
+
184
+ - **< 64**: Pad to 64, compute, then slice result
185
+ - **64-128**: Pad to 128, compute, then slice result
186
+ - **> 128**: Fallback to xformers or PyTorch SDPA
187
+
188
+ LightDiffusion-Next handles padding/slicing automatically.
189
+
190
+ ### Tensor Layout
191
+
192
+ SageAttention expects tensors in `(batch_size, num_heads, seq_len, head_dim)` format. The pipeline reshapes inputs transparently:
193
+
194
+ ```python
195
+ # Internal reshaping (handled automatically)
196
+ q, k, v = map(
197
+ lambda t: t.reshape(b, -1, heads, dim_head).transpose(1, 2),
198
+ (q, k, v),
199
+ )
200
+ out = sageattention.sageattn(q, k, v, tensor_layout="HND")
201
+ ```
202
+
203
+ ### SpargeAttn Thresholds
204
+
205
+ Default pruning parameters (tuned for quality/speed balance):
206
+
207
+ ```python
208
+ out = spas_sage_attn.spas_sage2_attn_meansim_cuda(
209
+ q, k, v,
210
+ simthreshd1=0.6, # Similarity threshold (60%)
211
+ cdfthreshd=0.97, # Keep top 97% of attention scores
212
+ pvthreshd=15, # Partial vector threshold
213
+ is_causal=False
214
+ )
215
+ ```
216
+
217
+ Adjust `simthreshd1` for different trade-offs:
218
+ - `0.5`: More aggressive pruning, higher speedup, slight quality loss
219
+ - `0.7`: Conservative pruning, lower speedup, minimal quality loss
220
+
221
+ ## Compatibility
222
+
223
+ ### Compatible With
224
+
225
+ - ✅ Stable Diffusion 1.5
226
+ - ✅ Stable Diffusion 2.1
227
+ - ✅ SDXL
228
+ - ✅ Flux (both cross-attention and self-attention blocks)
229
+ - ✅ All samplers (Euler, DPM++, etc.)
230
+ - ✅ LoRA adapters
231
+ - ✅ Textual inversion embeddings
232
+ - ✅ HiresFix, ADetailer, Img2Img
233
+ - ✅ Stable-Fast (when stacked)
234
+ - ✅ WaveSpeed caching (when stacked)
235
+
236
+ ### Known Limitations
237
+
238
+ - ❌ RTX 50xx (compute 12.0) does not support SpargeAttn yet (SageAttention works)
239
+ - ❌ CPU-only inference (CUDA required)
240
+ - ❌ AMD GPUs (ROCm port not available)
241
+ - ⚠️ Head dimensions > 128 fall back to slower backends
242
+
243
+ ## Troubleshooting
244
+
245
+ ### Import Error: `No module named 'sageattention'`
246
+
247
+ **Cause:** Not installed or installation failed.
248
+
249
+ **Fix:**
250
+ ```bash
251
+ cd SageAttention
252
+ pip install -e . --no-build-isolation
253
+ ```
254
+
255
+ Verify CUDA toolkit is accessible:
256
+ ```bash
257
+ nvcc --version # Should match PyTorch CUDA version
258
+ ```
259
+
260
+ ### Compilation Error: `nvcc fatal error`
261
+
262
+ **Cause:** CUDA toolkit not found or version mismatch.
263
+
264
+ **Fix:**
265
+ 1. Install CUDA toolkit matching your PyTorch version
266
+ 2. Add CUDA to PATH:
267
+ ```bash
268
+ export PATH=/usr/local/cuda/bin:$PATH
269
+ export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
270
+ ```
271
+ 3. Reinstall SageAttention
272
+
273
+ ### SpargeAttn Build Fails on Windows
274
+
275
+ **Cause:** Windows linker has path length limitations.
276
+
277
+ **Fix:** Use WSL2 or native Linux:
278
+ ```bash
279
+ # In WSL2
280
+ cd SpargeAttn
281
+ export TORCH_CUDA_ARCH_LIST="8.9"
282
+ pip install -e . --no-build-isolation
283
+ ```
284
+
285
+ ### Slower Than Expected
286
+
287
+ **Cause:** Wrong GPU architecture compiled or kernel fallback.
288
+
289
+ **Fix:**
290
+ 1. Check logs for "Using pytorch cross attention" (fallback indicator)
291
+ 2. Rebuild with correct `TORCH_CUDA_ARCH_LIST`
292
+ 3. Verify GPU compute capability:
293
+ ```bash
294
+ nvidia-smi --query-gpu=compute_cap --format=csv
295
+ ```
296
+
297
+ ### Quality Degradation with SpargeAttn
298
+
299
+ **Cause:** Pruning thresholds too aggressive.
300
+
301
+ **Fix:** Currently not user-configurable in the UI, but you can modify `src/Attention/AttentionMethods.py`:
302
+ ```python
303
+ # Line ~290
304
+ out = spas_sage_attn.spas_sage2_attn_meansim_cuda(
305
+ q, k, v,
306
+ simthreshd1=0.7, # Increase from 0.6 for better quality
307
+ cdfthreshd=0.98, # Increase from 0.97
308
+ pvthreshd=15,
309
+ is_causal=False
310
+ )
311
+ ```
312
+
313
+ ## Citation
314
+
315
+ If you use SageAttention or SpargeAttn in your work:
316
+
317
+ ```bibtex
318
+ @article{sageattention2024,
319
+ title={SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration},
320
+ author={Zhang, Jintao and Zhang, Jia and Zhai, Pengle and others},
321
+ journal={arXiv preprint arXiv:2410.02367},
322
+ year={2024}
323
+ }
324
+
325
+ @article{spargeattn2024,
326
+ title={SpargeAttn: Sparsity-Aware Efficient Attention for Long Context LLMs},
327
+ author={Zhang, Jintao and others},
328
+ journal={arXiv preprint},
329
+ year={2024}
330
+ }
331
+ ```
332
+
333
+ ## Resources
334
+
335
+ - [SageAttention Repository](https://github.com/thu-ml/SageAttention)
336
+ - [SpargeAttn Repository](https://github.com/thu-ml/SparseAttention)
337
+ - [SageAttention Paper](https://arxiv.org/abs/2410.02367)
338
+ - [Flash Attention](https://github.com/Dao-AILab/flash-attention) (related work)
docs/stablefast.md ADDED
@@ -0,0 +1,412 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Stable-Fast Compilation
2
+
3
+ ## Overview
4
+
5
+ Stable-Fast is a JIT compilation framework that optimizes Stable Diffusion UNet models by tracing execution, fusing operators and optionally capturing CUDA graphs. It can provide significant speedup for SD1.5/SDXL batch workflows with zero quality loss.
6
+
7
+ Unlike runtime attention optimizations (SageAttention, SpargeAttn), Stable-Fast performs **ahead-of-time compilation** on the first inference pass. The compiled model is cached and reused for subsequent generations with compatible shapes.
8
+
9
+ ## How It Works
10
+
11
+ Stable-Fast applies three optimization layers:
12
+
13
+ ### 1. TorchScript Tracing
14
+
15
+ The first forward pass through the UNet is recorded into a static computational graph:
16
+
17
+ ```python
18
+ traced_model = torch.jit.trace(unet, example_inputs)
19
+ ```
20
+
21
+ This eliminates Python interpreter overhead and enables downstream graph optimizations.
22
+
23
+ ### 2. Operator Fusion
24
+
25
+ The traced graph undergoes pattern-based fusion:
26
+
27
+ - **Conv + BatchNorm fusion**: Merges normalization into convolution weights
28
+ - **Activation fusion**: Fuses ReLU/GELU/SiLU directly into linear/conv ops
29
+ - **Memory layout optimization**: Converts to channels-last format for faster conv execution
30
+ - **Triton kernels**: Replaces PyTorch ops with hand-tuned Triton implementations (if `enable_triton=True`)
31
+
32
+ Example fusion:
33
+
34
+ ```python
35
+ # Before:
36
+ x = conv(input)
37
+ x = batch_norm(x)
38
+ x = relu(x)
39
+
40
+ # After:
41
+ x = fused_conv_bn_relu(input) # Single kernel launch
42
+ ```
43
+
44
+ ### 3. CUDA Graph Capture (Optional)
45
+
46
+ When `enable_cuda_graph=True`, the entire forward pass is captured as a static CUDA graph:
47
+
48
+ - Kernel launches are recorded once and replayed on subsequent runs
49
+ - Eliminates CPU launch overhead (~10-15% speedup)
50
+ - Requires fixed input shapes and batch sizes
51
+
52
+ **Trade-off:** Higher VRAM usage (~500MB for graph buffers) and less flexibility.
53
+
54
+ ## Installation
55
+
56
+ ### Windows/Linux (Manual)
57
+
58
+ Follow the [official guide](https://github.com/chengzeyi/stable-fast?tab=readme-ov-file#installation):
59
+
60
+ ```bash
61
+ # Install from PyPI (recommended)
62
+ pip install stable-fast
63
+
64
+ # Or build from source for latest features
65
+ git clone https://github.com/chengzeyi/stable-fast
66
+ cd stable-fast
67
+ pip install -e .
68
+ ```
69
+
70
+ **Prerequisites:**
71
+ - PyTorch 2.0+ with CUDA support
72
+ - xformers (optional but recommended)
73
+ - Triton (optional for Triton kernel fusion)
74
+
75
+ ### Docker
76
+
77
+ Stable-Fast is included in the Docker image when `INSTALL_STABLE_FAST=1`:
78
+
79
+ ```bash
80
+ docker-compose build --build-arg INSTALL_STABLE_FAST=1
81
+ ```
82
+
83
+ Default is `0` (disabled) to reduce image size and build time.
84
+
85
+ ## Usage
86
+
87
+ ### Streamlit UI
88
+
89
+ Enable in the **Performance** section of the sidebar:
90
+
91
+ 1. Check **Stable Fast**
92
+ 2. Generate images — the first run compiles the model (30-60s delay)
93
+ 3. Subsequent generations reuse the cached compiled model
94
+
95
+ **Visual indicator:** The first generation shows "Compiling model..." in the progress bar.
96
+
97
+ ### REST API
98
+
99
+ Pass `stable_fast: true` in the request payload:
100
+
101
+ ```bash
102
+ curl -X POST http://localhost:7861/api/generate \
103
+ -H "Content-Type: application/json" \
104
+ -d '{
105
+ "prompt": "a peaceful garden with cherry blossoms",
106
+ "width": 768,
107
+ "height": 512,
108
+ "num_images": 1,
109
+ "stable_fast": true
110
+ }'
111
+ ```
112
+
113
+ ### Configuration
114
+
115
+ Stable-Fast behavior is controlled by `CompilationConfig`:
116
+
117
+ ```python
118
+ from sfast.compilers.diffusion_pipeline_compiler import CompilationConfig
119
+
120
+ config = CompilationConfig.Default()
121
+ config.enable_xformers = True # Use xformers attention
122
+ config.enable_cuda_graph = False # CUDA graphs (set True for max speed)
123
+ config.enable_jit_freeze = True # Freeze traced graph
124
+ config.enable_cnn_optimization = True # Conv fusion
125
+ config.enable_triton = False # Triton kernels (experimental)
126
+ config.memory_format = torch.channels_last # Optimize memory layout
127
+ ```
128
+
129
+ LightDiffusion-Next uses sensible defaults (CUDA graphs disabled by default for flexibility). To override:
130
+
131
+ ```python
132
+ # In src/StableFast/StableFast.py
133
+ def gen_stable_fast_config(enable_cuda_graph=False):
134
+ config = CompilationConfig.Default()
135
+ config.enable_cuda_graph = enable_cuda_graph # Pass True for max speed
136
+ # ... rest of config
137
+ ```
138
+
139
+ ## Performance
140
+
141
+ ### Speedup Benchmarks
142
+
143
+ Stable-Fast provides speedup through:
144
+ - **JIT compilation**: Eliminates Python overhead
145
+ - **Operator fusion**: Reduces kernel launches
146
+ - **CUDA graphs** (optional): Further reduces CPU overhead
147
+
148
+ Speedup varies significantly based on:
149
+ - GPU architecture
150
+ - Batch size and generation count
151
+ - Model size (SD1.5 vs SDXL)
152
+ - Whether CUDA graphs are enabled
153
+
154
+ **Note:** Performance benefits are most noticeable for batch operations (50+ images). For single 20-step generations, compilation overhead may exceed speedup gains.
155
+
156
+ ### Compilation Time
157
+
158
+ First-run compilation overhead:
159
+
160
+ - **SD1.5 UNet**: ~30s (traced once per resolution/batch size)
161
+ - **SDXL UNet**: ~60s (larger model)
162
+ - **Subsequent runs**: <1s (cached)
163
+
164
+ Cached compiled models persist in `~/.cache/torch_extensions/`. Clear this directory to force recompilation.
165
+
166
+ ## Stacking with Other Optimizations
167
+
168
+ Stable-Fast is **fully compatible** with SageAttention, SpargeAttn and WaveSpeed:
169
+
170
+ ### Stable-Fast + SageAttention
171
+
172
+ ```yaml
173
+ stable_fast: true
174
+ # SageAttention auto-detected
175
+ ```
176
+
177
+ **Result:** 70% (Stable-Fast) + 15% (SageAttention) = **~2x total speedup**
178
+
179
+ ### Stable-Fast + SpargeAttn
180
+
181
+ ```yaml
182
+ stable_fast: true
183
+ # SpargeAttn auto-detected
184
+ ```
185
+
186
+ **Result:** 70% (Stable-Fast) + 40% (SpargeAttn) = **~2.4x total speedup**
187
+
188
+ ### Stable-Fast + SpargeAttn + DeepCache
189
+
190
+ ```yaml
191
+ stable_fast: true
192
+ deepcache:
193
+ enabled: true
194
+ interval: 3
195
+ depth: 2
196
+ # SpargeAttn auto-detected
197
+ ```
198
+
199
+ **Result:** 70% × 40% × 150% (DeepCache 2-3x) = **~4-5x total speedup**
200
+
201
+ ## Compatibility
202
+
203
+ ### Compatible With
204
+
205
+ - ✅ Stable Diffusion 1.5
206
+ - ✅ Stable Diffusion 2.1
207
+ - ✅ SDXL
208
+ - ✅ All samplers (Euler, DPM++, etc.)
209
+ - ✅ LoRA adapters
210
+ - ✅ Textual inversion embeddings
211
+ - ✅ HiresFix
212
+ - ✅ ADetailer
213
+ - ✅ Img2Img (with fixed denoise strength)
214
+ - ✅ SageAttention/SpargeAttn
215
+ - ✅ WaveSpeed caching
216
+
217
+ ### Not Compatible With
218
+
219
+ - ❌ Flux models (different architecture, no UNet)
220
+ - ❌ Dynamic resolution changes after compilation
221
+ - ❌ Dynamic batch size changes after compilation (with CUDA graphs)
222
+ - ⚠️ Frequent model switching (recompiles each time)
223
+
224
+ ## Troubleshooting
225
+
226
+ ### Slow First Run / Repeated Recompilation
227
+
228
+ **Symptom:** Every generation triggers compilation, even with identical settings.
229
+
230
+ **Causes:**
231
+ 1. Cache directory not writable
232
+ 2. System clock incorrect (invalidates timestamps)
233
+ 3. Different model loaded (each model is cached separately)
234
+
235
+ **Fixes:**
236
+ ```bash
237
+ # Check cache permissions
238
+ ls -la ~/.cache/torch_extensions
239
+
240
+ # Ensure stable timestamps
241
+ date # Should be correct
242
+
243
+ # Mount cache in Docker to persist across container restarts
244
+ docker run -v ~/.cache/torch_extensions:/root/.cache/torch_extensions ...
245
+ ```
246
+
247
+ ### CUDA Out of Memory During Compilation
248
+
249
+ **Symptom:** OOM error on first run but not subsequent runs.
250
+
251
+ **Cause:** Compilation allocates temporary buffers for tracing.
252
+
253
+ **Fixes:**
254
+ 1. Disable CUDA graphs: `enable_cuda_graph=False` (saves ~500MB)
255
+ 2. Reduce batch size temporarily for first run
256
+ 3. Clear other VRAM consumers (close other apps, disable model caching)
257
+
258
+ ### Compilation Hangs or Crashes
259
+
260
+ **Symptom:** Process freezes during "Compiling model..." step.
261
+
262
+ **Causes:**
263
+ 1. Triton compilation error (if `enable_triton=True`)
264
+ 2. Driver incompatibility
265
+ 3. Insufficient CPU RAM for graph analysis
266
+
267
+ **Fixes:**
268
+ ```bash
269
+ # Disable Triton
270
+ # In src/StableFast/StableFast.py:
271
+ config.enable_triton = False
272
+
273
+ # Update NVIDIA driver
274
+ nvidia-smi # Check version, upgrade if < 525.x
275
+
276
+ # Increase Docker memory limit
277
+ # In docker-compose.yml:
278
+ deploy:
279
+ resources:
280
+ limits:
281
+ memory: 16G # Increase from default
282
+ ```
283
+
284
+ ### Error: `torch.jit.trace` fails
285
+
286
+ **Symptom:** `RuntimeError: Could not trace model`
287
+
288
+ **Cause:** Dynamic control flow in model (if/else statements depending on runtime values).
289
+
290
+ **Fix:** This is rare with standard SD models. If it occurs:
291
+ 1. Check for custom LoRA/embeddings with dynamic logic
292
+ 2. Disable Stable-Fast for that specific generation
293
+ 3. Report issue with model details
294
+
295
+ ### Model Quality Degradation
296
+
297
+ **Symptom:** Compiled model produces different outputs than baseline.
298
+
299
+ **Cause:** Numeric precision differences from operator fusion (very rare).
300
+
301
+ **Fixes:**
302
+ ```python
303
+ # Disable aggressive optimizations
304
+ config.enable_cnn_optimization = False
305
+ config.memory_format = None # Use default layout
306
+ ```
307
+
308
+ If issue persists, disable Stable-Fast and file a bug report.
309
+
310
+ ## Advanced Configuration
311
+
312
+ ### Custom Compilation Config
313
+
314
+ Override defaults in `src/StableFast/StableFast.py`:
315
+
316
+ ```python
317
+ def gen_stable_fast_config(enable_cuda_graph=False):
318
+ config = CompilationConfig.Default()
319
+
320
+ # Maximum speed (higher VRAM usage)
321
+ config.enable_cuda_graph = True
322
+ config.enable_triton = True
323
+ config.prefer_lowp_gemm = True # Use FP16 matrix multiplies
324
+
325
+ # Balanced (recommended)
326
+ config.enable_cuda_graph = False
327
+ config.enable_triton = False
328
+ config.enable_cnn_optimization = True
329
+
330
+ # Debug (no optimizations)
331
+ config.enable_cuda_graph = False
332
+ config.enable_jit_freeze = False
333
+ config.enable_cnn_optimization = False
334
+
335
+ return config
336
+ ```
337
+
338
+ ### Clear Cached Compilations
339
+
340
+ ```bash
341
+ # Linux/Mac
342
+ rm -rf ~/.cache/torch_extensions
343
+
344
+ # Windows
345
+ del /s /q %USERPROFILE%\.cache\torch_extensions
346
+
347
+ # Docker (mount cache as volume)
348
+ docker run -v my_cache:/root/.cache/torch_extensions ...
349
+ docker volume rm my_cache # Clear cache
350
+ ```
351
+
352
+ ### Profile Compilation
353
+
354
+ ```bash
355
+ # Enable debug logging
356
+ export LD_SERVER_LOGLEVEL=DEBUG
357
+
358
+ # Run generation and check logs
359
+ cat logs/server.log | grep "Stable"
360
+ ```
361
+
362
+ ## Best Practices
363
+
364
+ ### Production Deployments
365
+
366
+ 1. **Pre-compile models** during startup with a warm-up request (only for batch/long-running services)
367
+ 2. **Mount cache volume** to persist compilations across container restarts
368
+ 3. **Disable CUDA graphs** if serving multiple batch sizes
369
+ 4. **Enable CUDA graphs** for fixed-resolution APIs with consistent high-volume traffic
370
+ 5. **Disable Stable-Fast entirely** for single-shot API endpoints (compilation overhead exceeds benefit)
371
+
372
+ Example warm-up:
373
+
374
+ ```python
375
+ # In startup script
376
+ def warmup_stable_fast(model, width=768, height=512):
377
+ """Pre-compile model with dummy input."""
378
+ dummy_input = torch.randn(1, 4, height // 8, width // 8, device="cuda")
379
+ dummy_timestep = torch.tensor([999], device="cuda")
380
+
381
+ with torch.no_grad():
382
+ model(dummy_input, dummy_timestep, c={})
383
+
384
+ print("Stable-Fast compilation complete")
385
+ ```
386
+
387
+ ### Development Workflows
388
+
389
+ 1. **Disable Stable-Fast** when experimenting with new models/LoRAs (avoids repeated recompilation)
390
+ 2. **Enable for final testing** to verify production performance
391
+ 3. **Clear cache** after upgrading PyTorch/CUDA drivers
392
+
393
+ ## Citation
394
+
395
+ If you use Stable-Fast in your work:
396
+
397
+ ```bibtex
398
+ @misc{stable-fast,
399
+ author = {Cheng Zeyi},
400
+ title = {stable-fast: Fast Inference for Stable Diffusion},
401
+ year = {2023},
402
+ publisher = {GitHub},
403
+ url = {https://github.com/chengzeyi/stable-fast}
404
+ }
405
+ ```
406
+
407
+ ## Resources
408
+
409
+ - [Stable-Fast Repository](https://github.com/chengzeyi/stable-fast)
410
+ - [Installation Guide](https://github.com/chengzeyi/stable-fast?tab=readme-ov-file#installation)
411
+ - [TorchScript Documentation](https://pytorch.org/docs/stable/jit.html)
412
+ - [CUDA Graphs Guide](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/)
docs/tome.md ADDED
@@ -0,0 +1,272 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Token Merging (ToMe)
2
+
3
+ ## Overview
4
+
5
+ Token Merging (ToMe) is a **performance optimization** that accelerates diffusion models by intelligently merging similar tokens in the attention mechanism. By identifying and combining redundant computations, ToMe achieves **20-60% speedup** with minimal quality impact.
6
+
7
+ Unlike feature caching (DeepCache, WaveSpeed), ToMe reduces the computational graph itself — fewer tokens means fewer attention operations, less memory bandwidth, and faster generation.
8
+
9
+ This is a **training-free**, **drop-in optimization** that works with all Stable Diffusion models (SD1.5, SDXL) and can be combined with other speedup techniques.
10
+
11
+ ## How It Works
12
+
13
+ ### The Token Redundancy Problem
14
+
15
+ Diffusion models process images as sequences of tokens (patches):
16
+
17
+ ```
18
+ Input Image (512×512) → Tokenize → 4096 tokens (64×64 grid of 8×8 patches)
19
+ ```
20
+
21
+ At each attention layer, **every token attends to every other token**:
22
+
23
+ $$
24
+ \text{Attention Cost} = O(N^2 \cdot D)
25
+ $$
26
+
27
+ Where:
28
+ - $N$ = number of tokens (e.g., 4096 for 512×512)
29
+ - $D$ = embedding dimension (e.g., 768 or 1024)
30
+
31
+ **Key insight:** Many tokens are highly similar (e.g., sky regions, uniform backgrounds, smooth gradients). Computing attention between nearly-identical tokens is redundant.
32
+
33
+ ### The ToMe Solution
34
+
35
+ Token Merging reduces redundancy through **bipartite matching**:
36
+
37
+ ```
38
+ Step 1: Split tokens into two sets
39
+ ┌─────────────────────┬─────────────────────┐
40
+ │ Destination Set (dst)│ Source Set (src) │
41
+ │ [Token 1, 3, 5, ...] │ [Token 2, 4, 6, ...] │
42
+ └─────────────────────┴─────────────────────┘
43
+
44
+ Step 2: Compute similarity (cosine distance)
45
+ dst[0] ↔ src[0]: 0.92 (highly similar!)
46
+ dst[0] ↔ src[1]: 0.34
47
+ dst[0] ↔ src[2]: 0.18
48
+ ...
49
+
50
+ Step 3: Merge most similar pairs
51
+ merged_token[0] = (dst[0] + src[0]) / 2
52
+
53
+ Step 4: Continue with fewer tokens
54
+ 4096 tokens → 2048 tokens (50% merge ratio)
55
+ Attention cost reduced by ~4x
56
+ ```
57
+
58
+ This happens **per attention layer**, with merge ratio dynamically adjusting based on layer depth.
59
+
60
+ ## Configuration
61
+
62
+ ### Parameters
63
+
64
+ | Parameter | Type | Default | Range | Description |
65
+ |-----------|------|---------|-------|-------------|
66
+ | `tome_enabled` | bool | `False` | - | Enable Token Merging |
67
+ | `tome_ratio` | float | `0.5` | 0.0-0.9 | Percentage of tokens to merge (higher = faster, lower quality) |
68
+ | `tome_max_downsample` | int | `1` | 1, 2, 4, 8 | Apply ToMe to layers with downsampling ≤ this value |
69
+
70
+ ### Choosing `tome_max_downsample`
71
+
72
+ Controls which UNet layers apply ToMe:
73
+
74
+ | Value | Layers Affected | Speed vs Quality |
75
+ |-------|----------------|------------------|
76
+ | **1** | Only full-resolution layers (4/15) | Conservative, minimal quality impact |
77
+ | **2** | Half-resolution layers (8/15) | Balanced (recommended) |
78
+ | **4** | Quarter-resolution layers (12/15) | Aggressive |
79
+ | **8** | All layers (15/15) | Maximum speedup, noticeable quality loss |
80
+
81
+ **Recommendation:** Start with `max_downsample=1`. Only increase if you need more speedup and can tolerate quality reduction.
82
+
83
+ ## Usage
84
+
85
+ ### Streamlit UI
86
+
87
+ Enable in the **🔀 Token Merging (ToMe)** expander:
88
+
89
+ 1. Check **Enable Token Merging**
90
+ 2. Select a preset:
91
+ - **Conservative** — 30% merge, max_downsample=2 (minimal impact)
92
+ - **Balanced** — 50% merge, max_downsample=1 (recommended)
93
+ - **Aggressive** — 70% merge, max_downsample=1 (maximum speed)
94
+ - **Custom** — Manual slider control
95
+ 3. Generate images — console confirms activation
96
+
97
+ **Visual feedback:**
98
+ ```
99
+ ✓ Token Merging ACTIVE: 50% merge ratio, max_downsample=1
100
+ ```
101
+
102
+ ### REST API
103
+
104
+ Include in your generation request:
105
+
106
+ ```bash
107
+ curl -X POST http://localhost:7861/api/generate \
108
+ -H "Content-Type: application/json" \
109
+ -d '{
110
+ "prompt": "a cyberpunk cityscape at night, neon lights",
111
+ "width": 1024,
112
+ "height": 512,
113
+ "steps": 25,
114
+ "tome_enabled": true,
115
+ "tome_ratio": 0.5,
116
+ "tome_max_downsample": 1
117
+ }'
118
+ ```
119
+
120
+ ### Python API
121
+
122
+ ```python
123
+ from src.user.pipeline import pipeline
124
+
125
+ pipeline(
126
+ prompt="a detailed fantasy castle on a cliff",
127
+ w=768,
128
+ h=1024,
129
+ steps=30,
130
+ sampler="dpmpp_sde_cfgpp",
131
+ scheduler="ays",
132
+ tome_enabled=True,
133
+ tome_ratio=0.5,
134
+ tome_max_downsample=1,
135
+ number=4 # Generate multiple images faster
136
+ )
137
+ ```
138
+
139
+ ## Troubleshooting
140
+
141
+ ### "No speedup detected"
142
+
143
+ **Possible causes:**
144
+ 1. **tomesd not installed** — Install with `pip install tomesd`
145
+ 2. **Other bottlenecks** — Enable only ToMe for isolated testing
146
+ 3. **Very low resolution** — ToMe benefits are minimal below 512px
147
+
148
+ **Solutions:**
149
+ ```bash
150
+ # Check installation
151
+ python -c "import tomesd; print('ToMe available')"
152
+
153
+ # Test in isolation at 1024×512 (ideal resolution for ToMe)
154
+ python quick_tome_test.py
155
+ ```
156
+
157
+ ### "Images look blurry or soft"
158
+
159
+ **Cause:** `tome_ratio` too high (>0.6) or `max_downsample` too aggressive (>2).
160
+
161
+ **Solutions:**
162
+ - Reduce `tome_ratio` to 0.4-0.5
163
+ - Lower `max_downsample` to 1
164
+ - Increase `steps` to 30-35 for better convergence
165
+ - Disable ToMe for final high-quality renders
166
+
167
+ ### "Minimal speedup despite 70% merge"
168
+
169
+ **Cause:** Other optimizations (DeepCache, Multi-Scale) already bottlenecked elsewhere (VAE decode, sampling overhead).
170
+
171
+ **Solutions:**
172
+ - Profile with isolated tests (disable all other optimizations)
173
+ - Ensure GPU isn't memory-bound (reduce batch size)
174
+ - Check system monitoring for CPU/disk bottlenecks
175
+
176
+ ### "Model fails to load / tomesd errors"
177
+
178
+ **Cause:** Outdated tomesd version or incompatible model architecture.
179
+
180
+ **Solutions:**
181
+ ```bash
182
+ # Update tomesd
183
+ pip install --upgrade tomesd
184
+
185
+ # Check compatibility (ToMe only works with UNet-based models)
186
+ # Flux/Transformer models require different ToMe variant (not yet supported)
187
+ ```
188
+
189
+ ## Technical Details
190
+
191
+ ### Implementation
192
+
193
+ ToMe is applied via the `ModelPatcher` class (`src/Model/ModelPatcher.py`):
194
+
195
+ ```python
196
+ def apply_tome(self, ratio: float = 0.5, max_downsample: int = 1) -> bool:
197
+ """Apply Token Merging to the diffusion model."""
198
+ # Remove any existing patch (handles cached models)
199
+ try:
200
+ tomesd.remove_patch(self)
201
+ except:
202
+ pass
203
+
204
+ # Apply ToMe patch
205
+ tomesd.apply_patch(
206
+ self, # ModelPatcher with .model.diffusion_model structure
207
+ ratio=ratio,
208
+ max_downsample=max_downsample
209
+ )
210
+ self.tome_enabled = True
211
+ return True
212
+ ```
213
+
214
+ **Cache handling:** ToMe patches are removed after each generation and re-applied as needed, ensuring correct behavior with model caching.
215
+
216
+ ### Bipartite Matching Algorithm
217
+
218
+ ToMe uses **proportional attention-based matching**:
219
+
220
+ 1. **Partition tokens:**
221
+ $$
222
+ T_{\text{dst}}, T_{\text{src}} = \text{partition}(T, \text{stride}=(2,2))
223
+ $$
224
+
225
+ 2. **Compute similarity matrix:**
226
+ $$
227
+ S_{ij} = \frac{T_{\text{dst}}[i] \cdot T_{\text{src}}[j]}{||T_{\text{dst}}[i]|| \cdot ||T_{\text{src}}[j]||}
228
+ $$
229
+
230
+ 3. **Find top-k matches:**
231
+ $$
232
+ k = \lfloor \text{ratio} \times |T_{\text{src}}| \rfloor
233
+ $$
234
+
235
+ 4. **Merge tokens:**
236
+ $$
237
+ T'[i] = \frac{T_{\text{dst}}[i] + T_{\text{src}}[\text{match}(i)]}{2}
238
+ $$
239
+
240
+ ## Compatibility
241
+
242
+ | Feature | Compatible? | Notes |
243
+ |---------|-------------|-------|
244
+ | **SD1.5 models** | ✓ | Full support, tested extensively |
245
+ | **SDXL models** | ✓ | Full support, larger speedup |
246
+ | **Flux models** | ✗ | UNet-specific, Transformer variant TBD |
247
+ | **All samplers** | ✓ | ToMe patches attention, agnostic to sampler |
248
+ | **CFG-Free** | ✓ | No interaction, both apply independently |
249
+ | **DeepCache** | ✓ | Excellent combination, speedups multiply |
250
+ | **Multi-Scale** | ✓ | Compatible, benefits stack |
251
+ | **HiRes Fix** | ✓ | Applied to all upscaling passes |
252
+ | **ADetailer** | ✓ | Applied to detail-enhancement passes |
253
+ | **Stable-Fast** | ✓ | Can combine for maximum speedup |
254
+
255
+ ## Limitations
256
+
257
+ 1. **UNet-only:** Transformer architectures (Flux) use different attention patterns — dedicated Transformer-ToMe needed
258
+ 2. **Detail sensitivity:** High-frequency textures (fabric weave, individual hairs) see most quality impact
259
+ 3. **Diminishing returns:** Beyond 60% merge, quality degrades faster than speed improves
260
+ 4. **One-time patch:** Doesn't adapt merge ratio dynamically during generation
261
+
262
+ ## Related Optimizations
263
+
264
+ - **[DeepCache](wavespeed.md#deepcache)**: Feature caching — complements ToMe, speedups multiply (~2.8x combined)
265
+ - **[Multi-Scale Diffusion](optimizations.md#multi-scale)**: Resolution-based optimization — also reduces token count
266
+ - **[Stable-Fast](stablefast.md)**: Compilation-based speedup — can combine for maximum performance
267
+
268
+ ## References & Further Reading
269
+
270
+ - **Original Paper:** [Token Merging for Fast Stable Diffusion](https://arxiv.org/abs/2303.17604) (Bolya & Hoffman, 2023)
271
+ - **tomesd Library:** https://github.com/dbolya/tomesd
272
+ - **ToMe for Vision Transformers:** https://github.com/facebookresearch/ToMe
docs/usage.md ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Usage
2
+
3
+ # First Run & UI Tour
4
+
5
+ This page walks you through launching LightDiffusion-Next, understanding the Streamlit layout, using the optional Gradio UI and triggering a first generation from the command line.
6
+
7
+ ## Launching the Streamlit UI
8
+
9
+ - **Windows:** run `run.bat` (see [Installation](installation.md)).
10
+ - **Linux/macOS/WSL2:** activate your virtual environment and run `streamlit run streamlit_app.py --server.port=8501`.
11
+ - **Docker:** start the compose stack and open `http://localhost:8501`.
12
+
13
+ You will see an initialization progress indicator while checkpoints and auxiliary models are downloaded. Once complete the app switches to a two-tab layout: **🎨 Generate** and **📜 History**.
14
+
15
+ ## Generate tab
16
+
17
+ The Generate tab is designed as a control surface where the left sidebar contains parameters and the right canvas displays previews and final renders.
18
+
19
+ ### Prompt & base settings
20
+
21
+ - **Prompt / Negative prompt** — text areas at the top of the sidebar. Negative prompts are optional; the pipeline automatically falls back to a curated default containing `EasyNegative`, `badhandv4`, `lr` and `ng_deepnegative` embeddings.
22
+ - **Dimensions** — width/height sliders (64–2048) with automatic aspect handling in the gallery.
23
+ - **Images & batch** — request multiple images per job; large requests may be chunked server-side into groups no larger than `LD_MAX_IMAGES_PER_GROUP` images (default: 256) to avoid memory and disk pressure. Use the `batch_size` setting to control internal sampler batch size and adjust `LD_MAX_IMAGES_PER_GROUP` via environment variables if necessary.
24
+
25
+ ### Feature toggles
26
+
27
+ - **HiRes Fix** — Upscales the latent and runs an extra sampling pass. Generates output in `output/HiresFix`.
28
+ - **ADetailer** — Uses SAM + YOLO and Impact Pack prompt heads to redraw faces/bodies. Additional artifacts are saved to `output/Adetailer`.
29
+ - **Enhance prompt** — Sends your prompt through the Ollama model specified by `PROMPT_ENHANCER_MODEL` (defaults to `qwen3:0.6b`). The rewritten prompt is shown in the sidebar and in image metadata.
30
+ - **Stable-Fast** — Enables UNet compilation (after the first warm-up) for faster iterations.
31
+ - **Flux mode** — Routes the job through the quantized Flux pipeline (requires the `ae.safetensors` VAE and quantized GGUF weights downloaded via `CheckAndDownloadFlux`).
32
+ - **Img2Img mode** — Reveals an image uploader. The selected picture is used as the source latent, optionally combined with UltimateSDUpscale.
33
+ - **Keep models in VRAM** — Toggle model caching between jobs to reduce load time at the cost of VRAM retention.
34
+ - **Real-time preview** — Streams TAESD previews into a responsive gallery while sampling is still running. Disable it when running headless to save resources.
35
+
36
+ ### Sampling & Scheduling
37
+
38
+ The **⚡ Sampling & Scheduling** section provides direct control over the sampling process:
39
+
40
+ - **Scheduler** — Choose from 8 scheduler options including the new **AYS (Align Your Steps)** schedulers which provide ~2x speedup by using optimized sigma distributions. Options include:
41
+ - Normal, Karras, Simple, Beta (traditional schedulers)
42
+ - AYS, AYS SD1.5, AYS SDXL, AYS Flux (optimized schedulers)
43
+ - **Sampler** — Select from 6 available samplers:
44
+ - Standard: Euler, Euler Ancestral
45
+ - CFG++ variants: Euler CFG++, Euler Ancestral CFG++, DPM++ 2M CFG++, DPM++ SDE CFG++
46
+ - **Steps** — Adjust sampling steps (1-150). The UI shows recommendations based on your scheduler choice (e.g., 10 steps for AYS vs 20 for normal).
47
+ - **Prompt Cache** — Toggle prompt caching on/off (enabled by default). View cache statistics showing hits/misses and clear the cache when needed.
48
+
49
+
50
+ ### Multi-scale diffusion presets
51
+
52
+ Under the “Multi-Scale Diffusion Settings” accordion you can:
53
+
54
+ - Choose a preset (`quality`, `performance`, `balanced`, `disabled`).
55
+ - Override the scale factor and the number of steps to run at full resolution.
56
+ - Enable intermittent full-resolution refinement.
57
+
58
+ Multi-scale diffusion provides major frame-time savings at high resolutions and is enabled by default.
59
+
60
+ ### Model cache management
61
+
62
+ - **🔍 Check VRAM Usage** — reports total/used/free VRAM, cached checkpoints and whether the “keep loaded” flag is active.
63
+ - **🗑️ Clear Model Cache** — evicts models from VRAM so the next job reloads everything fresh.
64
+
65
+ ### Status & previews
66
+
67
+ - A status bar at the bottom of the page surfaces timing, generation stage and any warnings.
68
+ - When real-time preview is enabled, the canvas shows the six most recent TAESD frames. They disappear automatically when generation completes.
69
+
70
+ ## Keyboard shortcuts & session state
71
+
72
+ - Most sliders support arrow-key and shift + arrow adjustments.
73
+ - The UI remembers your last-used settings inside `webui_settings.json`. Toggle “Verbose mode” in the settings drawer to see more runtime information.
74
+ - Seeds are stored in `include/last_seed.txt`. Enable “Reuse seed” to repeat a composition.
75
+
76
+ ## History tab
77
+
78
+ - Displays every PNG in the `output/**` tree with metadata overlays (timestamp, dimensions, prompt).
79
+ - Use “🔄 Refresh History” to rescan the folders, “🗑️ Delete Selected Image” for targeted cleanup or “⚠️ Clear All Images” to wipe everything.
80
+ - Selections show exact file paths so you can open them in external editors.
81
+
82
+ ## Using the Gradio UI
83
+
84
+ Run `python app.py` (or set `UI_FRAMEWORK=gradio` in Docker) to launch the Gradio frontend at `http://localhost:7860`.
85
+
86
+ - The controls mirror the Streamlit sidebar but the layout is optimized for Hugging Face Spaces.
87
+ - Live previews stream directly to the main gallery while jobs run.
88
+ - The 📸 Image History tab reads from the same `output/` folders as Streamlit, so both UIs share artifacts and metadata.
89
+
90
+ ## Command-line pipeline
91
+
92
+ You can invoke the pipeline without any UI for scripted jobs.
93
+
94
+ ```bash
95
+ python -m src.user.pipeline "a futuristic city at dusk" 768 512 2 2 --hires-fix --adetailer --stable-fast --reuse-seed
96
+ ```
97
+
98
+ - Positional arguments: `prompt width height number batch`.
99
+ - Flags mirror the UI toggles (`--img2img`, `--flux`, `--prio-speed`, `--multiscale-preset`, etc.).
100
+ - Img2Img uses the prompt as a filesystem path unless you pass `--img2img-image` through the FastAPI server (see [REST & automation](api.md)).
101
+
102
+ ## Streamlit tips
103
+
104
+ - Click “Retry Initialization” if the download step fails — the app reruns `CheckAndDownload()`.
105
+ - Use the sidebar menu → **Rerun** if you change source code while developing custom nodes.
106
+ - When running on laptops, disable “Keep models in VRAM” before closing the UI to release GPU memory for other applications.
107
+
108
+ ## Programmatic pipeline usage (Python)
109
+
110
+ You can import and call the pipeline directly from Python. The function lives at `src.user.pipeline.pipeline` and accepts the same runtime flags as the CLI. The example below shows a minimal, synchronous call that runs the pipeline and handles the returned mapping when running in batched mode.
111
+
112
+ ```python
113
+ from src.user.pipeline import pipeline
114
+
115
+ result = pipeline(
116
+ prompt=["a futuristic city at dusk", "a cyberpunk alley, rainy"],
117
+ w=768,
118
+ h=512,
119
+ number=2,
120
+ batch=2,
121
+ hires_fix=False,
122
+ adetailer=False,
123
+ stable_fast=False,
124
+ reuse_seed=False,
125
+ flux_enabled=False,
126
+ )
127
+
128
+ # When run in batched mode `pipeline` returns a dict with key 'batched_results'
129
+ if isinstance(result, dict) and "batched_results" in result:
130
+ for req_id, entries in result["batched_results"].items():
131
+ print(f"Request {req_id} produced {len(entries)} artifacts")
132
+ else:
133
+ print("Pipeline completed; check output/ for generated images")
134
+ ```
docs/wavespeed.md ADDED
@@ -0,0 +1,473 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # WaveSpeed Caching
2
+
3
+ ## Overview
4
+
5
+ WaveSpeed is the project's caching-oriented optimization layer for reusing work across denoising steps. In the current codebase, the integrated path is DeepCache for UNet-based models, and the repository also contains groundwork for a Flux-oriented First Block Cache path.
6
+
7
+ LightDiffusion-Next contains two WaveSpeed-related implementations:
8
+
9
+ 1. **DeepCache** — Integrated for UNet-based models (SD1.5, SDXL)
10
+ 2. **First Block Cache (FBCache)** — Flux-oriented cache machinery present in the codebase
11
+
12
+ Both are training-free. DeepCache is the user-facing path today; First Block Cache is codebase groundwork for a more specialized transformer caching path.
13
+
14
+ ## How It Works
15
+
16
+ ### Core Insight
17
+
18
+ Diffusion models denoise images iteratively over 20-50 steps. Researchers observed that:
19
+
20
+ - **High-level features** (semantic structure, composition) change slowly across steps
21
+ - **Low-level features** (fine details, textures) require frequent updates
22
+
23
+ WaveSpeed aims to reduce repeated computation across nearby denoising steps by reusing information from earlier steps where practical.
24
+
25
+ ### DeepCache (UNet Models) {#deepcache}
26
+
27
+ DeepCache is the integrated WaveSpeed path for UNet models.
28
+
29
+ **Cache step (every N steps):**
30
+ 1. Run the full denoiser path
31
+ 2. Store the output for later reuse
32
+
33
+ **Reuse step (intermediate steps):**
34
+ 1. Reuse the cached denoiser output
35
+ 2. Skip the full model recomputation for that step
36
+
37
+ **Speedup:** ~50-70% time saved per reuse step → 2-3x total speedup with `interval=3`
38
+
39
+ ### First Block Cache (Flux Models)
40
+
41
+ Flux uses Transformer blocks instead of UNet convolutions. The repository includes a First Block Cache implementation for this architecture family:
42
+
43
+ ```
44
+ ┌─────────────────────────────────────────┐
45
+ │ First Transformer Block (always run) │ ← Computes initial features
46
+ ├─────────────────────────────────────────┤
47
+ │ Remaining Blocks (cached if similar) │ ← FBCache caching zone
48
+ └─────────────────────────────────────────┘
49
+ ```
50
+
51
+ **Cache decision logic:**
52
+ 1. Run first Transformer block
53
+ 2. Compare output to previous step's output
54
+ 3. If difference < threshold: reuse cached remaining blocks
55
+ 4. If difference ≥ threshold: run all blocks and update cache
56
+
57
+ In the current project structure, this cache path is implementation groundwork rather than a standard generation toggle like DeepCache.
58
+
59
+ ## DeepCache Configuration
60
+
61
+ ### Parameters
62
+
63
+ | Parameter | Type | Default | Description |
64
+ |-----------|------|---------|-------------|
65
+ | `cache_interval` | int | 3 | Steps between cache updates (higher = faster, lower quality) |
66
+ | `cache_depth` | int | 2 | UNet depth for caching (0-12, higher = more aggressive) |
67
+ | `start_step` | int | 0 | Timestep to start caching (0-1000) |
68
+ | `end_step` | int | 1000 | Timestep to stop caching (0-1000) |
69
+
70
+ ### Streamlit UI
71
+
72
+ Enable in the **⚡ DeepCache Acceleration** expander:
73
+
74
+ 1. Check **Enable DeepCache**
75
+ 2. Adjust sliders:
76
+ - **Cache Interval**: 1-10 (default: 3)
77
+ - **Cache Depth**: 0-12 (default: 2)
78
+ - **Start/End Steps**: 0-1000 (default: 0/1000)
79
+ 3. Generate images — caching applies transparently
80
+
81
+ ### REST API
82
+
83
+ ```bash
84
+ curl -X POST http://localhost:7861/api/generate \
85
+ -H "Content-Type: application/json" \
86
+ -d '{
87
+ "prompt": "a misty forest at twilight",
88
+ "width": 768,
89
+ "height": 512,
90
+ "deepcache_enabled": true,
91
+ "deepcache_interval": 3,
92
+ "deepcache_depth": 2
93
+ }'
94
+ ```
95
+
96
+ ### Recommended Presets
97
+
98
+ #### Balanced (Default)
99
+ ```yaml
100
+ cache_interval: 3
101
+ cache_depth: 2
102
+ start_step: 0
103
+ end_step: 1000
104
+ ```
105
+ - **Speedup:** 2-2.3x
106
+ - **Quality loss:** Very slight (1-2%)
107
+ - **Use case:** Everyday generation
108
+
109
+ #### Maximum Speed
110
+ ```yaml
111
+ cache_interval: 5
112
+ cache_depth: 3
113
+ start_step: 0
114
+ end_step: 1000
115
+ ```
116
+ - **Speedup:** 2.5-3x
117
+ - **Quality loss:** Noticeable (5-7%)
118
+ - **Use case:** Rapid prototyping, batch jobs
119
+
120
+ #### Maximum Quality
121
+ ```yaml
122
+ cache_interval: 2
123
+ cache_depth: 1
124
+ start_step: 0
125
+ end_step: 1000
126
+ ```
127
+ - **Speedup:** 1.5-2x
128
+ - **Quality loss:** Minimal (<1%)
129
+ - **Use case:** Final renders, client work
130
+
131
+ #### Partial Caching (Critical Steps Only)
132
+ ```yaml
133
+ cache_interval: 3
134
+ cache_depth: 2
135
+ start_step: 200
136
+ end_step: 800
137
+ ```
138
+ - **Speedup:** 1.8-2.2x
139
+ - **Quality loss:** Minimal
140
+ - **Use case:** Preserve early structure, late details
141
+
142
+ ## First Block Cache (FBCache) Configuration
143
+
144
+ ### Parameters
145
+
146
+ | Parameter | Type | Default | Description |
147
+ |-----------|------|---------|-------------|
148
+ | `residual_diff_threshold` | float | 0.05 | Max feature difference to trigger cache reuse (0.0-1.0) |
149
+
150
+ ### Usage
151
+
152
+ First Block Cache is not currently exposed as a standard per-generation toggle. The implementation is available in the codebase for specialized integration work:
153
+
154
+ ```python
155
+ # In src/user/pipeline.py
156
+ from src.WaveSpeed import fbcache_nodes
157
+
158
+ # Create cache context
159
+ cache_context = fbcache_nodes.create_cache_context()
160
+
161
+ # Apply caching to a Flux-style model
162
+ with fbcache_nodes.cache_context(cache_context):
163
+ patched_model = fbcache_nodes.create_patch_flux_forward_orig(
164
+ flux_model,
165
+ residual_diff_threshold=0.05, # Lower = stricter caching
166
+ )
167
+ # Generate images...
168
+ ```
169
+
170
+ ### Tuning Threshold
171
+
172
+ - **Lower threshold (0.01-0.03)**: Stricter caching, recomputes more often, higher quality
173
+ - **Higher threshold (0.05-0.1)**: Looser caching, reuses more often, higher speedup
174
+ - **Recommended:** 0.05 (balances quality and speed)
175
+
176
+ ## Performance
177
+
178
+ ### Speedup Guidance
179
+
180
+ Speedup scales with cache interval and depth:
181
+
182
+ | Model | Cache Interval | Expected Behavior |
183
+ |-------|---------------|-------------------|
184
+ | SD1.5 | 2 | Moderate speedup, minimal quality loss |
185
+ | SD1.5 | 3 | Good speedup, slight quality loss |
186
+ | SD1.5 | 5 | High speedup, noticeable quality loss |
187
+ | SDXL | 3 | Good speedup, slight quality loss |
188
+ | Flux-style caching paths | implementation-specific | Depends on the integration path |
189
+
190
+ **Performance varies based on:**
191
+ - GPU architecture
192
+ - Model size
193
+ - Resolution
194
+ - Sampler choice
195
+ - Number of steps
196
+
197
+ **Recommendation:** Start with `interval=3` and adjust based on your quality requirements.### VRAM Impact
198
+
199
+ Caching increases VRAM usage slightly (50-200MB depending on resolution):
200
+
201
+ | Model | Baseline VRAM | + DeepCache | Increase |
202
+ |-------|--------------|-------------|----------|
203
+ | SD1.5 (768×512) | 3.2 GB | 3.4 GB | +200 MB |
204
+ | SDXL (1024×1024) | 6.8 GB | 7.0 GB | +200 MB |
205
+ | Flux (832×1216) | 12.5 GB | 12.6 GB | +100 MB |
206
+
207
+ ## Stacking with Other Optimizations
208
+
209
+ WaveSpeed is **fully compatible** with SageAttention, SpargeAttn and Stable-Fast:
210
+
211
+ ### DeepCache + SageAttention
212
+
213
+ ```yaml
214
+ deepcache_enabled: true
215
+ deepcache_interval: 3
216
+ # SageAttention auto-detected
217
+ ```
218
+
219
+ **Result:** 2.2x (DeepCache) × 1.15 (SageAttention) = **~2.5x total speedup**
220
+
221
+ ### DeepCache + SpargeAttn
222
+
223
+ ```yaml
224
+ deepcache_enabled: true
225
+ deepcache_interval: 3
226
+ # SpargeAttn auto-detected
227
+ ```
228
+
229
+ **Result:** Enhanced speedup from caching and sparse attention
230
+
231
+ ### DeepCache + Stable-Fast + SpargeAttn
232
+
233
+ ```yaml
234
+ stable_fast: true
235
+ deepcache_enabled: true
236
+ deepcache_interval: 3
237
+ # SpargeAttn auto-detected
238
+ ```
239
+
240
+ **Result:** Maximum combined speedup (all optimizations active, batch operations only)
241
+
242
+ ## Compatibility
243
+
244
+ ### DeepCache Compatible With
245
+
246
+ - ✅ Stable Diffusion 1.5
247
+ - ✅ Stable Diffusion 2.1
248
+ - ✅ SDXL
249
+ - ✅ All samplers (Euler, DPM++, etc.)
250
+ - ✅ LoRA adapters
251
+ - ✅ Textual inversion embeddings
252
+ - ✅ HiresFix
253
+ - ✅ ADetailer
254
+ - ✅ Multi-scale diffusion
255
+ - ✅ SageAttention/SpargeAttn
256
+ - ✅ Stable-Fast
257
+
258
+ ### DeepCache NOT Compatible With
259
+
260
+ - ❌ Flux models (use FBCache instead)
261
+ - ❌ Img2Img mode (can cause artifacts)
262
+
263
+ ### FBCache Compatible With
264
+
265
+ - ✅ Flux models
266
+ - ✅ SageAttention/SpargeAttn
267
+ - ✅ All Flux-compatible features
268
+
269
+ ### FBCache NOT Compatible With
270
+
271
+ - ❌ SD1.5/SDXL (use DeepCache instead)
272
+ - ❌ Stable-Fast (Flux not supported by Stable-Fast)
273
+
274
+ ## Troubleshooting
275
+
276
+ ### No Speedup Observed
277
+
278
+ **Causes:**
279
+ 1. DeepCache disabled or not applied to correct model type
280
+ 2. Cache interval too low (interval=1 provides no caching)
281
+ 3. Model loaded incorrectly
282
+
283
+ **Fixes:**
284
+ ```bash
285
+ # Check logs for DeepCache activation
286
+ cat logs/server.log | grep -i "deepcache\|cache"
287
+
288
+ # Verify UI toggle is enabled
289
+ # Streamlit: Check "Enable DeepCache" checkbox
290
+ # API: Ensure "deepcache_enabled": true in payload
291
+
292
+ # Try higher interval
293
+ deepcache_interval: 3 # Instead of 1 or 2
294
+ ```
295
+
296
+ ### Quality Degradation
297
+
298
+ **Symptoms:**
299
+ - Blurry details
300
+ - Smoothed textures
301
+ - Loss of fine patterns
302
+
303
+ **Causes:**
304
+ 1. Cache interval too high
305
+ 2. Cache depth too aggressive
306
+ 3. Wrong model type (Flux using DeepCache)
307
+
308
+ **Fixes:**
309
+ ```yaml
310
+ # Reduce cache interval
311
+ deepcache_interval: 2 # Down from 5
312
+
313
+ # Reduce cache depth
314
+ deepcache_depth: 1 # Down from 3
315
+
316
+ # Disable caching for critical phases
317
+ deepcache_start_step: 200 # Skip early structure formation
318
+ deepcache_end_step: 800 # Skip late detail refinement
319
+ ```
320
+
321
+ ### Artifacts in Img2Img
322
+
323
+ **Symptom:** Visible seams, inconsistent styles when using DeepCache with Img2Img.
324
+
325
+ **Cause:** Img2Img starts from a noisy input image, which violates DeepCache's assumptions about feature consistency.
326
+
327
+ **Fix:** Disable DeepCache for Img2Img:
328
+ ```yaml
329
+ deepcache_enabled: false # When img2img_enabled: true
330
+ ```
331
+
332
+ ### VRAM Increase
333
+
334
+ **Symptom:** OOM errors after enabling DeepCache.
335
+
336
+ **Cause:** Cached features consume additional VRAM.
337
+
338
+ **Fixes:**
339
+ 1. Reduce batch size
340
+ 2. Lower resolution
341
+ 3. Disable other VRAM-heavy features (Stable-Fast CUDA graphs)
342
+ 4. Use lower cache depth:
343
+ ```yaml
344
+ deepcache_depth: 1 # Minimal caching
345
+ ```
346
+
347
+ ### Flux FBCache Not Working
348
+
349
+ **Symptom:** No speedup with Flux generation.
350
+
351
+ **Cause:** FBCache implementation is more subtle — check logs for cache hit rate.
352
+
353
+ **Debugging:**
354
+ ```bash
355
+ # Enable debug logging
356
+ export LD_SERVER_LOGLEVEL=DEBUG
357
+
358
+ # Check cache statistics
359
+ cat logs/server.log | grep "cache"
360
+ ```
361
+
362
+ If no cache hits, try adjusting threshold:
363
+ ```python
364
+ # In pipeline.py
365
+ residual_diff_threshold=0.1 # Increase from 0.05 for more cache reuse
366
+ ```
367
+
368
+ ## Quality Comparison
369
+
370
+ Visual impact of different cache intervals:
371
+
372
+ | Interval | Speed | Visual Difference |
373
+ |----------|-------|-------------------|
374
+ | Disabled | Baseline | Baseline (100% quality) |
375
+ | 2 | Faster | Virtually identical |
376
+ | 3 | Much faster | Very subtle smoothing |
377
+ | 5 | Very fast | Noticeable detail loss |
378
+ | 7+ | Fastest | Obvious quality degradation |
379
+
380
+ **Recommendation:** Start with `interval=3` and adjust based on visual results.
381
+
382
+ ## Technical Details
383
+
384
+ ### DeepCache Implementation
385
+
386
+ Simplified pseudocode:
387
+
388
+ ```python
389
+ class DeepCacheWrapper:
390
+ def __init__(self, model, interval, depth):
391
+ self.model = model
392
+ self.interval = interval
393
+ self.cached_output = None
394
+ self.current_step = 0
395
+
396
+ def forward(self, x, timestep):
397
+ is_cache_step = (self.current_step % self.interval == 0)
398
+
399
+ if is_cache_step:
400
+ # Run full model, cache output
401
+ output = self.model(x, timestep)
402
+ self.cached_output = output.clone()
403
+ else:
404
+ # Reuse cached output (skip expensive computation)
405
+ output = self.cached_output
406
+
407
+ self.current_step += 1
408
+ return output
409
+ ```
410
+
411
+ Actual implementation in `src/WaveSpeed/deepcache_nodes.py` includes:
412
+ - Proper timestep tracking
413
+ - Cache invalidation on batch changes
414
+ - Error handling and fallback to full forward
415
+
416
+ ### FBCache Residual Comparison
417
+
418
+ ```python
419
+ # Compute first block output
420
+ first_output = first_transformer_block(hidden_states)
421
+
422
+ # Compare to previous step
423
+ residual = first_output - previous_first_output
424
+ residual_norm = residual.abs().mean() / first_output.abs().mean()
425
+
426
+ if residual_norm < threshold:
427
+ # Feature change is small — reuse cached blocks
428
+ hidden_states = apply_cached_residual(first_output)
429
+ else:
430
+ # Feature change is large — recompute all blocks
431
+ hidden_states = run_remaining_blocks(first_output)
432
+ cache_residual(hidden_states)
433
+ ```
434
+
435
+ ## Best Practices
436
+
437
+ ### For Everyday Use
438
+
439
+ 1. **Enable DeepCache** with default settings (`interval=3`, `depth=2`)
440
+ 2. **Stack with SageAttention** for 2.5x+ total speedup
441
+ 3. **Disable for final client renders** if absolute quality is critical
442
+
443
+ ### For Batch Processing
444
+
445
+ 1. **Use aggressive caching** (`interval=5`, `depth=3`)
446
+ 2. **Pre-generate previews** at high speed, re-render winners at full quality
447
+ 3. **Disable TAESD previews** to avoid overhead (set `enable_preview=false`)
448
+
449
+ ### For Low VRAM
450
+
451
+ 1. **Use conservative caching** (`interval=2`, `depth=1`)
452
+ 2. **Avoid stacking** with Stable-Fast CUDA graphs
453
+ 3. **Monitor VRAM** via `/api/telemetry` endpoint
454
+
455
+ ## Citation
456
+
457
+ If you use WaveSpeed/DeepCache in your work:
458
+
459
+ ```bibtex
460
+ @inproceedings{ma2023deepcache,
461
+ title={DeepCache: Accelerating Diffusion Models for Free},
462
+ author={Ma, Xinyin and Fang, Gongfan and Wang, Xinchao},
463
+ booktitle={CVPR},
464
+ year={2024}
465
+ }
466
+ ```
467
+
468
+ ## Resources
469
+
470
+ - [DeepCache Paper](https://arxiv.org/abs/2312.00858)
471
+ - [DeepCache Repository](https://github.com/horseee/DeepCache)
472
+ - [ComfyUI DeepCache Implementation](https://gist.github.com/laksjdjf/435c512bc19636e9c9af4ee7bea9eb86) (reference for LightDiffusion-Next)
473
+ - [First Block Cache Discussion](https://github.com/comfyanonymous/ComfyUI/discussions/3491)
download_flux.py ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ from pathlib import Path
4
+
5
+ # Add project root to path
6
+ project_root = Path(__file__).resolve().parent
7
+ sys.path.insert(0, str(project_root))
8
+
9
+ try:
10
+ from src.FileManaging import Downloader
11
+ print("Initializing Flux2 Klein download...")
12
+ Downloader.CheckAndDownloadFlux2()
13
+ print("\nDownload process finished.")
14
+ print("Models should be located in:")
15
+ print(" - ./include/diffusion_model/ (Diffusion Model)")
16
+ print(" - ./include/text_encoder/ (Text Encoder)")
17
+ print(" - ./include/vae/ (VAE)")
18
+ except ImportError as e:
19
+ print(f"Error: Could not import Downloader. Make sure you are running this from the project root. {e}")
20
+ except Exception as e:
21
+ print(f"An unexpected error occurred: {e}")
frontend/README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # React + TypeScript + Vite
2
+
3
+ This template provides a minimal setup to get React working in Vite with HMR and some ESLint rules.
4
+
5
+ Currently, two official plugins are available:
6
+
7
+ - [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react) uses [Babel](https://babeljs.io/) (or [oxc](https://oxc.rs) when used in [rolldown-vite](https://vite.dev/guide/rolldown)) for Fast Refresh
8
+ - [@vitejs/plugin-react-swc](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react-swc) uses [SWC](https://swc.rs/) for Fast Refresh
9
+
10
+ ## React Compiler
11
+
12
+ The React Compiler is not enabled on this template because of its impact on dev & build performances. To add it, see [this documentation](https://react.dev/learn/react-compiler/installation).
13
+
14
+ ## Expanding the ESLint configuration
15
+
16
+ If you are developing a production application, we recommend updating the configuration to enable type-aware lint rules:
17
+
18
+ ```js
19
+ export default defineConfig([
20
+ globalIgnores(['dist']),
21
+ {
22
+ files: ['**/*.{ts,tsx}'],
23
+ extends: [
24
+ // Other configs...
25
+
26
+ // Remove tseslint.configs.recommended and replace with this
27
+ tseslint.configs.recommendedTypeChecked,
28
+ // Alternatively, use this for stricter rules
29
+ tseslint.configs.strictTypeChecked,
30
+ // Optionally, add this for stylistic rules
31
+ tseslint.configs.stylisticTypeChecked,
32
+
33
+ // Other configs...
34
+ ],
35
+ languageOptions: {
36
+ parserOptions: {
37
+ project: ['./tsconfig.node.json', './tsconfig.app.json'],
38
+ tsconfigRootDir: import.meta.dirname,
39
+ },
40
+ // other options...
41
+ },
42
+ },
43
+ ])
44
+ ```
45
+
46
+ You can also install [eslint-plugin-react-x](https://github.com/Rel1cx/eslint-react/tree/main/packages/plugins/eslint-plugin-react-x) and [eslint-plugin-react-dom](https://github.com/Rel1cx/eslint-react/tree/main/packages/plugins/eslint-plugin-react-dom) for React-specific lint rules:
47
+
48
+ ```js
49
+ // eslint.config.js
50
+ import reactX from 'eslint-plugin-react-x'
51
+ import reactDom from 'eslint-plugin-react-dom'
52
+
53
+ export default defineConfig([
54
+ globalIgnores(['dist']),
55
+ {
56
+ files: ['**/*.{ts,tsx}'],
57
+ extends: [
58
+ // Other configs...
59
+ // Enable lint rules for React
60
+ reactX.configs['recommended-typescript'],
61
+ // Enable lint rules for React DOM
62
+ reactDom.configs.recommended,
63
+ ],
64
+ languageOptions: {
65
+ parserOptions: {
66
+ project: ['./tsconfig.node.json', './tsconfig.app.json'],
67
+ tsconfigRootDir: import.meta.dirname,
68
+ },
69
+ // other options...
70
+ },
71
+ },
72
+ ])
73
+ ```
frontend/dist/assets/index-7kNA4Hm-.js ADDED
The diff for this file is too large to render. See raw diff
 
frontend/dist/assets/index-CAwyaxYh.css ADDED
@@ -0,0 +1 @@
 
 
1
+ @import"https://fonts.googleapis.com/css2?family=Fraunces:opsz,wght@9..144,500;9..144,600;9..144,700&family=Instrument+Sans:wght@400;500;600;700&display=swap";@layer components;@layer properties{@supports (((-webkit-hyphens:none)) and (not (margin-trim:inline))) or ((-moz-orient:inline) and (not (color:rgb(from red r g b)))){*,:before,:after,::backdrop{--tw-scale-x:1;--tw-scale-y:1;--tw-scale-z:1;--tw-rotate-x:initial;--tw-rotate-y:initial;--tw-rotate-z:initial;--tw-skew-x:initial;--tw-skew-y:initial;--tw-pan-x:initial;--tw-pan-y:initial;--tw-pinch-zoom:initial;--tw-space-y-reverse:0;--tw-space-x-reverse:0;--tw-divide-x-reverse:0;--tw-border-style:solid;--tw-divide-y-reverse:0;--tw-leading:initial;--tw-font-weight:initial;--tw-tracking:initial;--tw-ordinal:initial;--tw-slashed-zero:initial;--tw-numeric-figure:initial;--tw-numeric-spacing:initial;--tw-numeric-fraction:initial;--tw-shadow:0 0 #0000;--tw-shadow-color:initial;--tw-shadow-alpha:100%;--tw-inset-shadow:0 0 #0000;--tw-inset-shadow-color:initial;--tw-inset-shadow-alpha:100%;--tw-ring-color:initial;--tw-ring-shadow:0 0 #0000;--tw-inset-ring-color:initial;--tw-inset-ring-shadow:0 0 #0000;--tw-ring-inset:initial;--tw-ring-offset-width:0px;--tw-ring-offset-color:#fff;--tw-ring-offset-shadow:0 0 #0000;--tw-outline-style:solid;--tw-blur:initial;--tw-brightness:initial;--tw-contrast:initial;--tw-grayscale:initial;--tw-hue-rotate:initial;--tw-invert:initial;--tw-opacity:initial;--tw-saturate:initial;--tw-sepia:initial;--tw-drop-shadow:initial;--tw-drop-shadow-color:initial;--tw-drop-shadow-alpha:100%;--tw-drop-shadow-size:initial;--tw-backdrop-blur:initial;--tw-backdrop-brightness:initial;--tw-backdrop-contrast:initial;--tw-backdrop-grayscale:initial;--tw-backdrop-hue-rotate:initial;--tw-backdrop-invert:initial;--tw-backdrop-opacity:initial;--tw-backdrop-saturate:initial;--tw-backdrop-sepia:initial;--tw-duration:initial;--tw-ease:initial;--tw-translate-x:0;--tw-translate-y:0;--tw-translate-z:0}}}@layer theme{:root,:host{--font-sans:"Instrument Sans", ui-sans-serif, sans-serif;--font-serif:"Fraunces", ui-serif, serif;--font-mono:ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace;--spacing:.25rem;--container-sm:24rem;--container-lg:32rem;--container-3xl:48rem;--container-4xl:56rem;--text-xs:.75rem;--text-xs--line-height:calc(1 / .75);--text-sm:.875rem;--text-sm--line-height:calc(1.25 / .875);--text-lg:1.125rem;--text-lg--line-height:calc(1.75 / 1.125);--font-weight-medium:500;--font-weight-semibold:600;--leading-tight:1.25;--radius-2xl:1rem;--radius-3xl:1.5rem;--ease-out:cubic-bezier(0, 0, .2, 1);--animate-spin:spin 1s linear infinite;--default-transition-duration:.15s;--default-transition-timing-function:cubic-bezier(.4, 0, .2, 1);--default-font-family:var(--font-sans);--default-mono-font-family:var(--font-mono);--color-canvas:oklch(97.8% .012 78);--color-paper:oklch(99.2% .008 82);--color-oat:oklch(96.7% .018 79);--color-sand:oklch(94% .018 76);--color-stone:oklch(83% .016 73);--color-line:oklch(88% .012 76);--color-ink:oklch(25.5% .02 55);--color-muted:oklch(54% .015 67);--color-clay:oklch(64% .15 41);--color-clay-strong:oklch(56% .16 39);--animate-accordion-down:accordion-down .22s cubic-bezier(.16, 1, .3, 1);--animate-accordion-up:accordion-up .18s cubic-bezier(.16, 1, .3, 1)}}@layer base{*,:after,:before,::backdrop{box-sizing:border-box;border:0 solid;margin:0;padding:0}::file-selector-button{box-sizing:border-box;border:0 solid;margin:0;padding:0}html,:host{-webkit-text-size-adjust:100%;tab-size:4;line-height:1.5;font-family:var(--default-font-family,ui-sans-serif, system-ui, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji");font-feature-settings:var(--default-font-feature-settings,normal);font-variation-settings:var(--default-font-variation-settings,normal);-webkit-tap-highlight-color:transparent}hr{height:0;color:inherit;border-top-width:1px}abbr:where([title]){-webkit-text-decoration:underline dotted;text-decoration:underline dotted}h1,h2,h3,h4,h5,h6{font-size:inherit;font-weight:inherit}a{color:inherit;-webkit-text-decoration:inherit;text-decoration:inherit}b,strong{font-weight:bolder}code,kbd,samp,pre{font-family:var(--default-mono-font-family,ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace);font-feature-settings:var(--default-mono-font-feature-settings,normal);font-variation-settings:var(--default-mono-font-variation-settings,normal);font-size:1em}small{font-size:80%}sub,sup{vertical-align:baseline;font-size:75%;line-height:0;position:relative}sub{bottom:-.25em}sup{top:-.5em}table{text-indent:0;border-color:inherit;border-collapse:collapse}:-moz-focusring{outline:auto}progress{vertical-align:baseline}summary{display:list-item}ol,ul,menu{list-style:none}img,svg,video,canvas,audio,iframe,embed,object{vertical-align:middle;display:block}img,video{max-width:100%;height:auto}button,input,select,optgroup,textarea{font:inherit;font-feature-settings:inherit;font-variation-settings:inherit;letter-spacing:inherit;color:inherit;opacity:1;background-color:#0000;border-radius:0}::file-selector-button{font:inherit;font-feature-settings:inherit;font-variation-settings:inherit;letter-spacing:inherit;color:inherit;opacity:1;background-color:#0000;border-radius:0}:where(select:is([multiple],[size])) optgroup{font-weight:bolder}:where(select:is([multiple],[size])) optgroup option{padding-inline-start:20px}::file-selector-button{margin-inline-end:4px}::placeholder{opacity:1}@supports (not ((-webkit-appearance:-apple-pay-button))) or (contain-intrinsic-size:1px){::placeholder{color:currentColor}@supports (color:color-mix(in lab,red,red)){::placeholder{color:color-mix(in oklab,currentcolor 50%,transparent)}}}textarea{resize:vertical}::-webkit-search-decoration{-webkit-appearance:none}::-webkit-date-and-time-value{min-height:1lh;text-align:inherit}::-webkit-datetime-edit{display:inline-flex}::-webkit-datetime-edit-fields-wrapper{padding:0}::-webkit-datetime-edit{padding-block:0}::-webkit-datetime-edit-year-field{padding-block:0}::-webkit-datetime-edit-month-field{padding-block:0}::-webkit-datetime-edit-day-field{padding-block:0}::-webkit-datetime-edit-hour-field{padding-block:0}::-webkit-datetime-edit-minute-field{padding-block:0}::-webkit-datetime-edit-second-field{padding-block:0}::-webkit-datetime-edit-millisecond-field{padding-block:0}::-webkit-datetime-edit-meridiem-field{padding-block:0}::-webkit-calendar-picker-indicator{line-height:1}:-moz-ui-invalid{box-shadow:none}button,input:where([type=button],[type=reset],[type=submit]){appearance:button}::file-selector-button{appearance:button}::-webkit-inner-spin-button{height:auto}::-webkit-outer-spin-button{height:auto}[hidden]:where(:not([hidden=until-found])){display:none!important}:root{color:var(--color-ink);background:var(--color-canvas);font-synthesis:none;text-rendering:optimizelegibility;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}*{border-color:#dfdad3}@supports (color:color-mix(in lab,red,red)){*{border-color:color-mix(in oklab,var(--color-line) 92%,white)}}html,body,#root{min-height:100%}body{font-family:var(--font-sans);color:var(--color-ink);background:radial-gradient(circle at top,#fbf3e79e,#0000 31rem),linear-gradient(#fffcf7,#fdf8f0 23rem);margin:0}@supports (color:color-mix(in lab,red,red)){body{background:radial-gradient(circle at top,color-mix(in oklab,var(--color-oat) 62%,transparent),transparent 31rem),linear-gradient(180deg,color-mix(in oklab,var(--color-paper) 99%,white),color-mix(in oklab,var(--color-canvas) 92%,white) 23rem)}}button,input,textarea{font:inherit}img{max-width:100%;display:block}::selection{background:#f6ded4}@supports (color:color-mix(in lab,red,red)){::selection{background:color-mix(in oklab,var(--color-clay) 22%,white)}}}@layer utilities{.pointer-events-none{pointer-events:none}.collapse{visibility:collapse}.invisible{visibility:hidden}.visible{visibility:visible}.sr-only{clip-path:inset(50%);white-space:nowrap;border-width:0;width:1px;height:1px;margin:-1px;padding:0;position:absolute;overflow:hidden}.not-sr-only{clip-path:none;white-space:normal;width:auto;height:auto;margin:0;padding:0;position:static;overflow:visible}.absolute{position:absolute}.fixed{position:fixed}.relative{position:relative}.static{position:static}.sticky{position:sticky}.inset-0{inset:calc(var(--spacing) * 0)}.inset-x-0{inset-inline:calc(var(--spacing) * 0)}.inset-x-3{inset-inline:calc(var(--spacing) * 3)}.inset-x-4{inset-inline:calc(var(--spacing) * 4)}.inset-y-3{inset-block:calc(var(--spacing) * 3)}.start{inset-inline-start:var(--spacing)}.end{inset-inline-end:var(--spacing)}.top-0{top:calc(var(--spacing) * 0)}.top-3{top:calc(var(--spacing) * 3)}.top-4{top:calc(var(--spacing) * 4)}.top-5{top:calc(var(--spacing) * 5)}.right-3{right:calc(var(--spacing) * 3)}.right-5{right:calc(var(--spacing) * 5)}.bottom-3{bottom:calc(var(--spacing) * 3)}.bottom-4{bottom:calc(var(--spacing) * 4)}.left-3{left:calc(var(--spacing) * 3)}.isolate{isolation:isolate}.isolation-auto{isolation:auto}.z-10{z-index:10}.z-50{z-index:50}.container{width:100%}@media(min-width:40rem){.container{max-width:40rem}}@media(min-width:48rem){.container{max-width:48rem}}@media(min-width:64rem){.container{max-width:64rem}}@media(min-width:80rem){.container{max-width:80rem}}@media(min-width:96rem){.container{max-width:96rem}}.-mx-1{margin-inline:calc(var(--spacing) * -1)}.mx-auto{margin-inline:auto}.my-1{margin-block:calc(var(--spacing) * 1)}.-mt-2{margin-top:calc(var(--spacing) * -2)}.mt-1{margin-top:calc(var(--spacing) * 1)}.mt-2\.5{margin-top:calc(var(--spacing) * 2.5)}.mt-3{margin-top:calc(var(--spacing) * 3)}.mt-4{margin-top:calc(var(--spacing) * 4)}.mb-2{margin-bottom:calc(var(--spacing) * 2)}.block{display:block}.contents{display:contents}.flex{display:flex}.flow-root{display:flow-root}.grid{display:grid}.hidden{display:none}.inline{display:inline}.inline-block{display:inline-block}.inline-flex{display:inline-flex}.inline-grid{display:inline-grid}.inline-table{display:inline-table}.list-item{display:list-item}.table{display:table}.table-caption{display:table-caption}.table-cell{display:table-cell}.table-column{display:table-column}.table-column-group{display:table-column-group}.table-footer-group{display:table-footer-group}.table-header-group{display:table-header-group}.table-row{display:table-row}.table-row-group{display:table-row-group}.h-1\.5{height:calc(var(--spacing) * 1.5)}.h-2\.5{height:calc(var(--spacing) * 2.5)}.h-3\.5{height:calc(var(--spacing) * 3.5)}.h-4{height:calc(var(--spacing) * 4)}.h-5{height:calc(var(--spacing) * 5)}.h-6{height:calc(var(--spacing) * 6)}.h-9{height:calc(var(--spacing) * 9)}.h-10{height:calc(var(--spacing) * 10)}.h-11{height:calc(var(--spacing) * 11)}.h-12{height:calc(var(--spacing) * 12)}.h-14{height:calc(var(--spacing) * 14)}.h-16{height:calc(var(--spacing) * 16)}.h-28{height:calc(var(--spacing) * 28)}.h-40{height:calc(var(--spacing) * 40)}.h-52{height:calc(var(--spacing) * 52)}.h-96{height:calc(var(--spacing) * 96)}.h-\[4\.25rem\]{height:4.25rem}.h-\[calc\(100\%-4rem\)\]{height:calc(100% - 4rem)}.h-\[calc\(100vh-2rem\)\]{height:calc(100vh - 2rem)}.h-\[min\(88vh\,860px\)\]{height:min(88vh,860px)}.h-\[var\(--radix-select-trigger-height\)\]{height:var(--radix-select-trigger-height)}.h-auto{height:auto}.h-full{height:100%}.h-px{height:1px}.max-h-80{max-height:calc(var(--spacing) * 80)}.max-h-\[calc\(100vh-10rem\)\]{max-height:calc(100vh - 10rem)}.min-h-0{min-height:calc(var(--spacing) * 0)}.min-h-36{min-height:calc(var(--spacing) * 36)}.min-h-40{min-height:calc(var(--spacing) * 40)}.min-h-\[108px\]{min-height:108px}.min-h-\[124px\]{min-height:124px}.min-h-\[172px\]{min-height:172px}.min-h-\[460px\]{min-height:460px}.min-h-screen{min-height:100vh}.w-2\.5{width:calc(var(--spacing) * 2.5)}.w-3\.5{width:calc(var(--spacing) * 3.5)}.w-3\/4{width:75%}.w-4{width:calc(var(--spacing) * 4)}.w-5{width:calc(var(--spacing) * 5)}.w-6{width:calc(var(--spacing) * 6)}.w-9{width:calc(var(--spacing) * 9)}.w-10{width:calc(var(--spacing) * 10)}.w-11{width:calc(var(--spacing) * 11)}.w-12{width:calc(var(--spacing) * 12)}.w-16{width:calc(var(--spacing) * 16)}.w-\[4\.25rem\]{width:4.25rem}.w-\[26rem\]{width:26rem}.w-auto{width:auto}.w-full{width:100%}.w-px{width:1px}.max-w-3xl{max-width:var(--container-3xl)}.max-w-4xl{max-width:var(--container-4xl)}.max-w-\[1200px\]{max-width:1200px}.max-w-\[1320px\]{max-width:1320px}.max-w-full{max-width:100%}.max-w-lg{max-width:var(--container-lg)}.min-w-\[8rem\]{min-width:8rem}.min-w-\[var\(--radix-select-trigger-width\)\]{min-width:var(--radix-select-trigger-width)}.flex-1{flex:1}.shrink{flex-shrink:1}.shrink-0{flex-shrink:0}.grow{flex-grow:1}.border-collapse{border-collapse:collapse}.translate-none{translate:none}.scale-3d{scale:var(--tw-scale-x) var(--tw-scale-y) var(--tw-scale-z)}.transform{transform:var(--tw-rotate-x,) var(--tw-rotate-y,) var(--tw-rotate-z,) var(--tw-skew-x,) var(--tw-skew-y,)}.animate-spin{animation:var(--animate-spin)}.cursor-default{cursor:default}.cursor-pointer{cursor:pointer}.touch-pinch-zoom{--tw-pinch-zoom:pinch-zoom;touch-action:var(--tw-pan-x,) var(--tw-pan-y,) var(--tw-pinch-zoom,)}.touch-none{touch-action:none}.resize{resize:both}.flex-col{flex-direction:column}.flex-col-reverse{flex-direction:column-reverse}.flex-wrap{flex-wrap:wrap}.items-center{align-items:center}.justify-between{justify-content:space-between}.justify-center{justify-content:center}.justify-end{justify-content:flex-end}.gap-1\.5{gap:calc(var(--spacing) * 1.5)}.gap-2{gap:calc(var(--spacing) * 2)}.gap-3{gap:calc(var(--spacing) * 3)}.gap-4{gap:calc(var(--spacing) * 4)}.gap-5{gap:calc(var(--spacing) * 5)}.gap-6{gap:calc(var(--spacing) * 6)}:where(.space-y-1>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 1) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 1) * calc(1 - var(--tw-space-y-reverse)))}:where(.space-y-1\.5>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 1.5) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 1.5) * calc(1 - var(--tw-space-y-reverse)))}:where(.space-y-2>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 2) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 2) * calc(1 - var(--tw-space-y-reverse)))}:where(.space-y-2\.5>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 2.5) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 2.5) * calc(1 - var(--tw-space-y-reverse)))}:where(.space-y-3>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 3) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 3) * calc(1 - var(--tw-space-y-reverse)))}:where(.space-y-4>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 4) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 4) * calc(1 - var(--tw-space-y-reverse)))}:where(.space-y-5>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 5) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 5) * calc(1 - var(--tw-space-y-reverse)))}:where(.space-y-reverse>:not(:last-child)){--tw-space-y-reverse:1}:where(.space-x-reverse>:not(:last-child)){--tw-space-x-reverse:1}:where(.divide-x>:not(:last-child)){--tw-divide-x-reverse:0;border-inline-style:var(--tw-border-style);border-inline-start-width:calc(1px * var(--tw-divide-x-reverse));border-inline-end-width:calc(1px * calc(1 - var(--tw-divide-x-reverse)))}:where(.divide-y>:not(:last-child)){--tw-divide-y-reverse:0;border-bottom-style:var(--tw-border-style);border-top-style:var(--tw-border-style);border-top-width:calc(1px * var(--tw-divide-y-reverse));border-bottom-width:calc(1px * calc(1 - var(--tw-divide-y-reverse)))}:where(.divide-y-reverse>:not(:last-child)){--tw-divide-y-reverse:1}.self-end{align-self:flex-end}.truncate{text-overflow:ellipsis;white-space:nowrap;overflow:hidden}.overflow-hidden{overflow:hidden}.rounded-2xl{border-radius:var(--radius-2xl)}.rounded-3xl{border-radius:var(--radius-3xl)}.rounded-\[1\.2rem\]{border-radius:1.2rem}.rounded-\[1\.4rem\]{border-radius:1.4rem}.rounded-\[1\.5rem\]{border-radius:1.5rem}.rounded-\[1\.7rem\]{border-radius:1.7rem}.rounded-\[1\.9rem\]{border-radius:1.9rem}.rounded-\[1\.15rem\]{border-radius:1.15rem}.rounded-\[1\.35rem\]{border-radius:1.35rem}.rounded-\[1\.75rem\]{border-radius:1.75rem}.rounded-\[1rem\]{border-radius:1rem}.rounded-\[2\.1rem\]{border-radius:2.1rem}.rounded-\[2rem\]{border-radius:2rem}.rounded-\[inherit\]{border-radius:inherit}.rounded-full{border-radius:3.40282e38px}.rounded-s{border-start-start-radius:.25rem;border-end-start-radius:.25rem}.rounded-ss{border-start-start-radius:.25rem}.rounded-e{border-start-end-radius:.25rem;border-end-end-radius:.25rem}.rounded-se{border-start-end-radius:.25rem}.rounded-ee{border-end-end-radius:.25rem}.rounded-es{border-end-start-radius:.25rem}.rounded-t{border-top-left-radius:.25rem;border-top-right-radius:.25rem}.rounded-t-\[2\.25rem\]{border-top-left-radius:2.25rem;border-top-right-radius:2.25rem}.rounded-l{border-top-left-radius:.25rem;border-bottom-left-radius:.25rem}.rounded-tl{border-top-left-radius:.25rem}.rounded-r{border-top-right-radius:.25rem;border-bottom-right-radius:.25rem}.rounded-tr{border-top-right-radius:.25rem}.rounded-b{border-bottom-right-radius:.25rem;border-bottom-left-radius:.25rem}.rounded-b-\[2rem\]{border-bottom-right-radius:2rem;border-bottom-left-radius:2rem}.rounded-br{border-bottom-right-radius:.25rem}.rounded-bl{border-bottom-left-radius:.25rem}.border{border-style:var(--tw-border-style);border-width:1px}.border-x{border-inline-style:var(--tw-border-style);border-inline-width:1px}.border-y{border-block-style:var(--tw-border-style);border-block-width:1px}.border-s{border-inline-start-style:var(--tw-border-style);border-inline-start-width:1px}.border-e{border-inline-end-style:var(--tw-border-style);border-inline-end-width:1px}.border-bs{border-block-start-style:var(--tw-border-style);border-block-start-width:1px}.border-be{border-block-end-style:var(--tw-border-style);border-block-end-width:1px}.border-t{border-top-style:var(--tw-border-style);border-top-width:1px}.border-t-0{border-top-style:var(--tw-border-style);border-top-width:0}.border-r{border-right-style:var(--tw-border-style);border-right-width:1px}.border-b{border-bottom-style:var(--tw-border-style);border-bottom-width:1px}.border-b-0{border-bottom-style:var(--tw-border-style);border-bottom-width:0}.border-l{border-left-style:var(--tw-border-style);border-left-width:1px}.border-dashed{--tw-border-style:dashed;border-style:dashed}.border-clay{border-color:var(--color-clay)}.border-clay-strong{border-color:var(--color-clay-strong)}.border-line{border-color:var(--color-line)}.border-line\/65{border-color:#dcd7cfa6}@supports (color:color-mix(in lab,red,red)){.border-line\/65{border-color:color-mix(in oklab,var(--color-line) 65%,transparent)}}.border-line\/70{border-color:#dcd7cfb3}@supports (color:color-mix(in lab,red,red)){.border-line\/70{border-color:color-mix(in oklab,var(--color-line) 70%,transparent)}}.border-line\/75{border-color:#dcd7cfbf}@supports (color:color-mix(in lab,red,red)){.border-line\/75{border-color:color-mix(in oklab,var(--color-line) 75%,transparent)}}.border-line\/80{border-color:#dcd7cfcc}@supports (color:color-mix(in lab,red,red)){.border-line\/80{border-color:color-mix(in oklab,var(--color-line) 80%,transparent)}}.border-transparent{border-color:#0000}.border-t-transparent{border-top-color:#0000}.border-l-transparent{border-left-color:#0000}.bg-canvas{background-color:var(--color-canvas)}.bg-canvas\/34{background-color:#fcf7ef57}@supports (color:color-mix(in lab,red,red)){.bg-canvas\/34{background-color:color-mix(in oklab,var(--color-canvas) 34%,transparent)}}.bg-canvas\/48{background-color:#fcf7ef7a}@supports (color:color-mix(in lab,red,red)){.bg-canvas\/48{background-color:color-mix(in oklab,var(--color-canvas) 48%,transparent)}}.bg-clay{background-color:var(--color-clay)}.bg-clay\/8{background-color:#d6683d14}@supports (color:color-mix(in lab,red,red)){.bg-clay\/8{background-color:color-mix(in oklab,var(--color-clay) 8%,transparent)}}.bg-clay\/10{background-color:#d6683d1a}@supports (color:color-mix(in lab,red,red)){.bg-clay\/10{background-color:color-mix(in oklab,var(--color-clay) 10%,transparent)}}.bg-ink{background-color:var(--color-ink)}.bg-ink\/14{background-color:#2b201a24}@supports (color:color-mix(in lab,red,red)){.bg-ink\/14{background-color:color-mix(in oklab,var(--color-ink) 14%,transparent)}}.bg-ink\/\[0\.04\]{background-color:#2b201a0a}@supports (color:color-mix(in lab,red,red)){.bg-ink\/\[0\.04\]{background-color:color-mix(in oklab,var(--color-ink) 4%,transparent)}}.bg-line{background-color:var(--color-line)}.bg-oat\/32{background-color:#fbf3e752}@supports (color:color-mix(in lab,red,red)){.bg-oat\/32{background-color:color-mix(in oklab,var(--color-oat) 32%,transparent)}}.bg-oat\/42{background-color:#fbf3e76b}@supports (color:color-mix(in lab,red,red)){.bg-oat\/42{background-color:color-mix(in oklab,var(--color-oat) 42%,transparent)}}.bg-oat\/45{background-color:#fbf3e773}@supports (color:color-mix(in lab,red,red)){.bg-oat\/45{background-color:color-mix(in oklab,var(--color-oat) 45%,transparent)}}.bg-oat\/55{background-color:#fbf3e78c}@supports (color:color-mix(in lab,red,red)){.bg-oat\/55{background-color:color-mix(in oklab,var(--color-oat) 55%,transparent)}}.bg-oat\/60{background-color:#fbf3e799}@supports (color:color-mix(in lab,red,red)){.bg-oat\/60{background-color:color-mix(in oklab,var(--color-oat) 60%,transparent)}}.bg-paper{background-color:var(--color-paper)}.bg-paper\/62{background-color:#fffcf79e}@supports (color:color-mix(in lab,red,red)){.bg-paper\/62{background-color:color-mix(in oklab,var(--color-paper) 62%,transparent)}}.bg-paper\/76{background-color:#fffcf7c2}@supports (color:color-mix(in lab,red,red)){.bg-paper\/76{background-color:color-mix(in oklab,var(--color-paper) 76%,transparent)}}.bg-paper\/90{background-color:#fffcf7e6}@supports (color:color-mix(in lab,red,red)){.bg-paper\/90{background-color:color-mix(in oklab,var(--color-paper) 90%,transparent)}}.bg-paper\/92{background-color:#fffcf7eb}@supports (color:color-mix(in lab,red,red)){.bg-paper\/92{background-color:color-mix(in oklab,var(--color-paper) 92%,transparent)}}.bg-paper\/94{background-color:#fffcf7f0}@supports (color:color-mix(in lab,red,red)){.bg-paper\/94{background-color:color-mix(in oklab,var(--color-paper) 94%,transparent)}}.bg-paper\/97{background-color:#fffcf7f7}@supports (color:color-mix(in lab,red,red)){.bg-paper\/97{background-color:color-mix(in oklab,var(--color-paper) 97%,transparent)}}.bg-paper\/98{background-color:#fffcf7fa}@supports (color:color-mix(in lab,red,red)){.bg-paper\/98{background-color:color-mix(in oklab,var(--color-paper) 98%,transparent)}}.bg-sand{background-color:var(--color-sand)}.bg-sand\/45{background-color:#f2eade73}@supports (color:color-mix(in lab,red,red)){.bg-sand\/45{background-color:color-mix(in oklab,var(--color-sand) 45%,transparent)}}.bg-stone\/60{background-color:#cec6bc99}@supports (color:color-mix(in lab,red,red)){.bg-stone\/60{background-color:color-mix(in oklab,var(--color-stone) 60%,transparent)}}.bg-\[radial-gradient\(circle_at_top_left\,color-mix\(in_oklab\,var\(--color-oat\)_86\%\,transparent\)\,transparent_66\%\)\]{background-image:radial-gradient(circle at 0 0,#fbf3e7db,#0000 66%)}@supports (color:color-mix(in lab,red,red)){.bg-\[radial-gradient\(circle_at_top_left\,color-mix\(in_oklab\,var\(--color-oat\)_86\%\,transparent\)\,transparent_66\%\)\]{background-image:radial-gradient(circle at top left,color-mix(in oklab,var(--color-oat) 86%,transparent),transparent 66%)}}.bg-repeat{background-repeat:repeat}.mask-no-clip{-webkit-mask-clip:no-clip;mask-clip:no-clip}.mask-repeat{-webkit-mask-repeat:repeat;mask-repeat:repeat}.object-contain{object-fit:contain}.object-cover{object-fit:cover}.p-1\.5{padding:calc(var(--spacing) * 1.5)}.p-2{padding:calc(var(--spacing) * 2)}.p-3{padding:calc(var(--spacing) * 3)}.p-4{padding:calc(var(--spacing) * 4)}.p-5{padding:calc(var(--spacing) * 5)}.p-6{padding:calc(var(--spacing) * 6)}.p-\[1px\]{padding:1px}.px-1{padding-inline:calc(var(--spacing) * 1)}.px-3{padding-inline:calc(var(--spacing) * 3)}.px-3\.5{padding-inline:calc(var(--spacing) * 3.5)}.px-4{padding-inline:calc(var(--spacing) * 4)}.px-5{padding-inline:calc(var(--spacing) * 5)}.px-6{padding-inline:calc(var(--spacing) * 6)}.px-8{padding-inline:calc(var(--spacing) * 8)}.py-1{padding-block:calc(var(--spacing) * 1)}.py-1\.5{padding-block:calc(var(--spacing) * 1.5)}.py-2{padding-block:calc(var(--spacing) * 2)}.py-2\.5{padding-block:calc(var(--spacing) * 2.5)}.py-3{padding-block:calc(var(--spacing) * 3)}.py-3\.5{padding-block:calc(var(--spacing) * 3.5)}.py-4{padding-block:calc(var(--spacing) * 4)}.py-5{padding-block:calc(var(--spacing) * 5)}.py-6{padding-block:calc(var(--spacing) * 6)}.pt-0\.5{padding-top:calc(var(--spacing) * .5)}.pt-2{padding-top:calc(var(--spacing) * 2)}.pt-3{padding-top:calc(var(--spacing) * 3)}.pt-4{padding-top:calc(var(--spacing) * 4)}.pr-1{padding-right:calc(var(--spacing) * 1)}.pr-3{padding-right:calc(var(--spacing) * 3)}.pr-10{padding-right:calc(var(--spacing) * 10)}.pb-2{padding-bottom:calc(var(--spacing) * 2)}.pb-3{padding-bottom:calc(var(--spacing) * 3)}.pb-4{padding-bottom:calc(var(--spacing) * 4)}.pb-7{padding-bottom:calc(var(--spacing) * 7)}.pb-10{padding-bottom:calc(var(--spacing) * 10)}.pl-8{padding-left:calc(var(--spacing) * 8)}.text-center{text-align:center}.text-left{text-align:left}.font-serif{font-family:var(--font-serif)}.text-lg{font-size:var(--text-lg);line-height:var(--tw-leading,var(--text-lg--line-height))}.text-sm{font-size:var(--text-sm);line-height:var(--tw-leading,var(--text-sm--line-height))}.text-xs{font-size:var(--text-xs);line-height:var(--tw-leading,var(--text-xs--line-height))}.text-\[1\.05rem\]{font-size:1.05rem}.text-\[1\.35rem\]{font-size:1.35rem}.text-\[11px\]{font-size:11px}.text-\[15px\]{font-size:15px}.text-\[16px\]{font-size:16px}.text-\[clamp\(1\.8rem\,3vw\,2\.5rem\)\]{font-size:clamp(1.8rem,3vw,2.5rem)}.text-\[clamp\(2\.75rem\,5\.2vw\,5rem\)\]{font-size:clamp(2.75rem,5.2vw,5rem)}.leading-5{--tw-leading:calc(var(--spacing) * 5);line-height:calc(var(--spacing) * 5)}.leading-6{--tw-leading:calc(var(--spacing) * 6);line-height:calc(var(--spacing) * 6)}.leading-7{--tw-leading:calc(var(--spacing) * 7);line-height:calc(var(--spacing) * 7)}.leading-\[0\.92\]{--tw-leading:.92;line-height:.92}.leading-tight{--tw-leading:var(--leading-tight);line-height:var(--leading-tight)}.font-medium{--tw-font-weight:var(--font-weight-medium);font-weight:var(--font-weight-medium)}.font-semibold{--tw-font-weight:var(--font-weight-semibold);font-weight:var(--font-weight-semibold)}.tracking-\[-0\.03em\]{--tw-tracking:-.03em;letter-spacing:-.03em}.tracking-\[-0\.025em\]{--tw-tracking:-.025em;letter-spacing:-.025em}.tracking-\[-0\.035em\]{--tw-tracking:-.035em;letter-spacing:-.035em}.tracking-\[-0\.055em\]{--tw-tracking:-.055em;letter-spacing:-.055em}.tracking-\[0\.16em\]{--tw-tracking:.16em;letter-spacing:.16em}.text-wrap{text-wrap:wrap}.text-clip{text-overflow:clip}.text-ellipsis{text-overflow:ellipsis}.whitespace-nowrap{white-space:nowrap}.text-clay{color:var(--color-clay)}.text-clay-strong{color:var(--color-clay-strong)}.text-ink{color:var(--color-ink)}.text-muted{color:var(--color-muted)}.text-paper{color:var(--color-paper)}.capitalize{text-transform:capitalize}.lowercase{text-transform:lowercase}.normal-case{text-transform:none}.uppercase{text-transform:uppercase}.italic{font-style:italic}.not-italic{font-style:normal}.diagonal-fractions{--tw-numeric-fraction:diagonal-fractions;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.lining-nums{--tw-numeric-figure:lining-nums;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.oldstyle-nums{--tw-numeric-figure:oldstyle-nums;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.ordinal{--tw-ordinal:ordinal;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.proportional-nums{--tw-numeric-spacing:proportional-nums;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.slashed-zero{--tw-slashed-zero:slashed-zero;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.stacked-fractions{--tw-numeric-fraction:stacked-fractions;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.tabular-nums{--tw-numeric-spacing:tabular-nums;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.normal-nums{font-variant-numeric:normal}.line-through{text-decoration-line:line-through}.no-underline{text-decoration-line:none}.overline{text-decoration-line:overline}.underline{text-decoration-line:underline}.antialiased{-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}.subpixel-antialiased{-webkit-font-smoothing:auto;-moz-osx-font-smoothing:auto}.opacity-70{opacity:.7}.shadow{--tw-shadow:0 1px 3px 0 var(--tw-shadow-color,#0000001a), 0 1px 2px -1px var(--tw-shadow-color,#0000001a);box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_1px_2px_color-mix\(in_oklab\,var\(--color-ink\)_12\%\,transparent\)\]{--tw-shadow:0 1px 2px var(--tw-shadow-color,#2b201a1f)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_1px_2px_color-mix\(in_oklab\,var\(--color-ink\)_12\%\,transparent\)\]{--tw-shadow:0 1px 2px var(--tw-shadow-color,color-mix(in oklab,var(--color-ink) 12%,transparent))}}.shadow-\[0_1px_2px_color-mix\(in_oklab\,var\(--color-ink\)_12\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_10px_20px_-18px_color-mix\(in_oklab\,var\(--color-clay\)_28\%\,transparent\)\]{--tw-shadow:0 10px 20px -18px var(--tw-shadow-color,#d6683d47)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_10px_20px_-18px_color-mix\(in_oklab\,var\(--color-clay\)_28\%\,transparent\)\]{--tw-shadow:0 10px 20px -18px var(--tw-shadow-color,color-mix(in oklab,var(--color-clay) 28%,transparent))}}.shadow-\[0_10px_20px_-18px_color-mix\(in_oklab\,var\(--color-clay\)_28\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_10px_30px_-18px_color-mix\(in_oklab\,var\(--color-ink\)_48\%\,transparent\)\]{--tw-shadow:0 10px 30px -18px var(--tw-shadow-color,#2b201a7a)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_10px_30px_-18px_color-mix\(in_oklab\,var\(--color-ink\)_48\%\,transparent\)\]{--tw-shadow:0 10px 30px -18px var(--tw-shadow-color,color-mix(in oklab,var(--color-ink) 48%,transparent))}}.shadow-\[0_10px_30px_-18px_color-mix\(in_oklab\,var\(--color-ink\)_48\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_12px_30px_-22px_color-mix\(in_oklab\,var\(--color-clay\)_50\%\,transparent\)\]{--tw-shadow:0 12px 30px -22px var(--tw-shadow-color,#d6683d80)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_12px_30px_-22px_color-mix\(in_oklab\,var\(--color-clay\)_50\%\,transparent\)\]{--tw-shadow:0 12px 30px -22px var(--tw-shadow-color,color-mix(in oklab,var(--color-clay) 50%,transparent))}}.shadow-\[0_12px_30px_-22px_color-mix\(in_oklab\,var\(--color-clay\)_50\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_16px_30px_-24px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{--tw-shadow:0 16px 30px -24px var(--tw-shadow-color,#2b201a2e)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_16px_30px_-24px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{--tw-shadow:0 16px 30px -24px var(--tw-shadow-color,color-mix(in oklab,var(--color-ink) 18%,transparent))}}.shadow-\[0_16px_30px_-24px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_18px_42px_-36px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{--tw-shadow:0 18px 42px -36px var(--tw-shadow-color,#2b201a2e)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_18px_42px_-36px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{--tw-shadow:0 18px 42px -36px var(--tw-shadow-color,color-mix(in oklab,var(--color-ink) 18%,transparent))}}.shadow-\[0_18px_42px_-36px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_20px_48px_-36px_color-mix\(in_oklab\,var\(--color-ink\)_16\%\,transparent\)\]{--tw-shadow:0 20px 48px -36px var(--tw-shadow-color,#2b201a29)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_20px_48px_-36px_color-mix\(in_oklab\,var\(--color-ink\)_16\%\,transparent\)\]{--tw-shadow:0 20px 48px -36px var(--tw-shadow-color,color-mix(in oklab,var(--color-ink) 16%,transparent))}}.shadow-\[0_20px_48px_-36px_color-mix\(in_oklab\,var\(--color-ink\)_16\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_22px_46px_-28px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{--tw-shadow:0 22px 46px -28px var(--tw-shadow-color,#2b201a2e)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_22px_46px_-28px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{--tw-shadow:0 22px 46px -28px var(--tw-shadow-color,color-mix(in oklab,var(--color-ink) 18%,transparent))}}.shadow-\[0_22px_46px_-28px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_28px_80px_-28px_color-mix\(in_oklab\,var\(--color-ink\)_20\%\,transparent\)\]{--tw-shadow:0 28px 80px -28px var(--tw-shadow-color,#2b201a33)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_28px_80px_-28px_color-mix\(in_oklab\,var\(--color-ink\)_20\%\,transparent\)\]{--tw-shadow:0 28px 80px -28px var(--tw-shadow-color,color-mix(in oklab,var(--color-ink) 20%,transparent))}}.shadow-\[0_28px_80px_-28px_color-mix\(in_oklab\,var\(--color-ink\)_20\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[inset_0_1px_0_color-mix\(in_oklab\,var\(--color-paper\)_40\%\,white\)\]{--tw-shadow:inset 0 1px 0 var(--tw-shadow-color,#fffefc)}@supports (color:color-mix(in lab,red,red)){.shadow-\[inset_0_1px_0_color-mix\(in_oklab\,var\(--color-paper\)_40\%\,white\)\]{--tw-shadow:inset 0 1px 0 var(--tw-shadow-color,color-mix(in oklab,var(--color-paper) 40%,white))}}.shadow-\[inset_0_1px_0_color-mix\(in_oklab\,var\(--color-paper\)_40\%\,white\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-none{--tw-shadow:0 0 #0000;box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.ring-0{--tw-ring-shadow:var(--tw-ring-inset,) 0 0 0 calc(0px + var(--tw-ring-offset-width)) var(--tw-ring-color,currentcolor);box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.inset-ring{--tw-inset-ring-shadow:inset 0 0 0 1px var(--tw-inset-ring-color,currentcolor);box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.outline{outline-style:var(--tw-outline-style);outline-width:1px}.blur{--tw-blur:blur(8px);filter:var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,)}.drop-shadow{--tw-drop-shadow-size:drop-shadow(0 1px 2px var(--tw-drop-shadow-color,#0000001a)) drop-shadow(0 1px 1px var(--tw-drop-shadow-color,#0000000f));--tw-drop-shadow:drop-shadow(0 1px 2px #0000001a) drop-shadow(0 1px 1px #0000000f);filter:var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,)}.filter{filter:var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,)}.filter\!{filter:var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,)!important}.backdrop-blur{--tw-backdrop-blur:blur(8px);-webkit-backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,);backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,)}.backdrop-blur-\[1\.5px\]{--tw-backdrop-blur:blur(1.5px);-webkit-backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,);backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,)}.backdrop-grayscale{--tw-backdrop-grayscale:grayscale(100%);-webkit-backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,);backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,)}.backdrop-invert{--tw-backdrop-invert:invert(100%);-webkit-backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,);backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,)}.backdrop-sepia{--tw-backdrop-sepia:sepia(100%);-webkit-backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,);backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,)}.backdrop-filter{-webkit-backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,);backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,)}.transition{transition-property:color,background-color,border-color,outline-color,text-decoration-color,fill,stroke,--tw-gradient-from,--tw-gradient-via,--tw-gradient-to,opacity,box-shadow,transform,translate,scale,rotate,filter,-webkit-backdrop-filter,backdrop-filter,display,content-visibility,overlay,pointer-events;transition-timing-function:var(--tw-ease,var(--default-transition-timing-function));transition-duration:var(--tw-duration,var(--default-transition-duration))}.transition-\[color\,background-color\,border-color\,box-shadow\,transform\]{transition-property:color,background-color,border-color,box-shadow,transform;transition-timing-function:var(--tw-ease,var(--default-transition-timing-function));transition-duration:var(--tw-duration,var(--default-transition-duration))}.transition-\[width\]{transition-property:width;transition-timing-function:var(--tw-ease,var(--default-transition-timing-function));transition-duration:var(--tw-duration,var(--default-transition-duration))}.transition-colors{transition-property:color,background-color,border-color,outline-color,text-decoration-color,fill,stroke,--tw-gradient-from,--tw-gradient-via,--tw-gradient-to;transition-timing-function:var(--tw-ease,var(--default-transition-timing-function));transition-duration:var(--tw-duration,var(--default-transition-duration))}.transition-transform{transition-property:transform,translate,scale,rotate;transition-timing-function:var(--tw-ease,var(--default-transition-timing-function));transition-duration:var(--tw-duration,var(--default-transition-duration))}.duration-200{--tw-duration:.2s;transition-duration:.2s}.duration-300{--tw-duration:.3s;transition-duration:.3s}.ease-out{--tw-ease:var(--ease-out);transition-timing-function:var(--ease-out)}.outline-none{--tw-outline-style:none;outline-style:none}.select-none{-webkit-user-select:none;user-select:none}:where(.divide-x-reverse>:not(:last-child)){--tw-divide-x-reverse:1}.ring-inset{--tw-ring-inset:inset}.group-data-\[state\=open\]\:rotate-180:is(:where(.group)[data-state=open] *){rotate:180deg}.placeholder\:text-muted::placeholder{color:var(--color-muted)}@media(hover:hover){.hover\:-translate-y-0\.5:hover{--tw-translate-y:calc(var(--spacing) * -.5);translate:var(--tw-translate-x) var(--tw-translate-y)}.hover\:border-clay\/35:hover{border-color:#d6683d59}@supports (color:color-mix(in lab,red,red)){.hover\:border-clay\/35:hover{border-color:color-mix(in oklab,var(--color-clay) 35%,transparent)}}.hover\:border-clay\/40:hover{border-color:#d6683d66}@supports (color:color-mix(in lab,red,red)){.hover\:border-clay\/40:hover{border-color:color-mix(in oklab,var(--color-clay) 40%,transparent)}}.hover\:border-clay\/45:hover{border-color:#d6683d73}@supports (color:color-mix(in lab,red,red)){.hover\:border-clay\/45:hover{border-color:color-mix(in oklab,var(--color-clay) 45%,transparent)}}.hover\:bg-clay-strong:hover{background-color:var(--color-clay-strong)}.hover\:bg-ink\/92:hover{background-color:#2b201aeb}@supports (color:color-mix(in lab,red,red)){.hover\:bg-ink\/92:hover{background-color:color-mix(in oklab,var(--color-ink) 92%,transparent)}}.hover\:bg-oat:hover{background-color:var(--color-oat)}.hover\:bg-oat\/75:hover{background-color:#fbf3e7bf}@supports (color:color-mix(in lab,red,red)){.hover\:bg-oat\/75:hover{background-color:color-mix(in oklab,var(--color-oat) 75%,transparent)}}.hover\:bg-paper:hover{background-color:var(--color-paper)}.hover\:bg-sand:hover{background-color:var(--color-sand)}.hover\:text-clay:hover{color:var(--color-clay)}.hover\:text-ink:hover{color:var(--color-ink)}}.focus\:ring-2:focus{--tw-ring-shadow:var(--tw-ring-inset,) 0 0 0 calc(2px + var(--tw-ring-offset-width)) var(--tw-ring-color,currentcolor);box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.focus\:ring-clay\/20:focus{--tw-ring-color:#d6683d33}@supports (color:color-mix(in lab,red,red)){.focus\:ring-clay\/20:focus{--tw-ring-color:color-mix(in oklab, var(--color-clay) 20%, transparent)}}.focus\:outline-none:focus{--tw-outline-style:none;outline-style:none}.focus-visible\:ring-2:focus-visible{--tw-ring-shadow:var(--tw-ring-inset,) 0 0 0 calc(2px + var(--tw-ring-offset-width)) var(--tw-ring-color,currentcolor);box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.focus-visible\:ring-clay\/20:focus-visible{--tw-ring-color:#d6683d33}@supports (color:color-mix(in lab,red,red)){.focus-visible\:ring-clay\/20:focus-visible{--tw-ring-color:color-mix(in oklab, var(--color-clay) 20%, transparent)}}.focus-visible\:ring-clay\/25:focus-visible{--tw-ring-color:#d6683d40}@supports (color:color-mix(in lab,red,red)){.focus-visible\:ring-clay\/25:focus-visible{--tw-ring-color:color-mix(in oklab, var(--color-clay) 25%, transparent)}}.focus-visible\:outline-none:focus-visible{--tw-outline-style:none;outline-style:none}.disabled\:pointer-events-none:disabled{pointer-events:none}.disabled\:cursor-not-allowed:disabled{cursor:not-allowed}.disabled\:opacity-45:disabled{opacity:.45}.disabled\:opacity-50:disabled{opacity:.5}.data-\[disabled\]\:pointer-events-none[data-disabled]{pointer-events:none}.data-\[disabled\]\:opacity-40[data-disabled]{opacity:.4}.data-\[highlighted\]\:bg-sand[data-highlighted]{background-color:var(--color-sand)}.data-\[side\=bottom\]\:translate-y-1[data-side=bottom]{--tw-translate-y:calc(var(--spacing) * 1);translate:var(--tw-translate-x) var(--tw-translate-y)}.data-\[side\=top\]\:-translate-y-1[data-side=top]{--tw-translate-y:calc(var(--spacing) * -1);translate:var(--tw-translate-x) var(--tw-translate-y)}.data-\[state\=checked\]\:translate-x-5[data-state=checked]{--tw-translate-x:calc(var(--spacing) * 5);translate:var(--tw-translate-x) var(--tw-translate-y)}.data-\[state\=checked\]\:bg-clay[data-state=checked]{background-color:var(--color-clay)}.data-\[state\=closed\]\:animate-accordion-up[data-state=closed]{animation:var(--animate-accordion-up)}.data-\[state\=open\]\:animate-accordion-down[data-state=open]{animation:var(--animate-accordion-down)}.data-\[state\=unchecked\]\:translate-x-0[data-state=unchecked]{--tw-translate-x:calc(var(--spacing) * 0);translate:var(--tw-translate-x) var(--tw-translate-y)}@media(min-width:40rem){.sm\:inset-x-6{inset-inline:calc(var(--spacing) * 6)}.sm\:h-\[4\.9rem\]{height:4.9rem}.sm\:min-h-\[680px\]{min-height:680px}.sm\:w-\[4\.9rem\]{width:4.9rem}.sm\:max-w-none{max-width:none}.sm\:max-w-sm{max-width:var(--container-sm)}.sm\:grid-cols-2{grid-template-columns:repeat(2,minmax(0,1fr))}.sm\:grid-cols-\[minmax\(0\,1fr\)_auto\]{grid-template-columns:minmax(0,1fr) auto}.sm\:flex-row{flex-direction:row}.sm\:justify-end{justify-content:flex-end}:where(.sm\:space-y-5>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 5) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 5) * calc(1 - var(--tw-space-y-reverse)))}.sm\:p-3{padding:calc(var(--spacing) * 3)}.sm\:p-4{padding:calc(var(--spacing) * 4)}.sm\:px-5{padding-inline:calc(var(--spacing) * 5)}.sm\:px-6{padding-inline:calc(var(--spacing) * 6)}.sm\:px-7{padding-inline:calc(var(--spacing) * 7)}.sm\:py-7{padding-block:calc(var(--spacing) * 7)}}@media(min-width:64rem){.lg\:grid-cols-\[minmax\(0\,1\.2fr\)_minmax\(0\,1fr\)\]{grid-template-columns:minmax(0,1.2fr) minmax(0,1fr)}.lg\:grid-cols-\[minmax\(0\,1fr\)_15rem\]{grid-template-columns:minmax(0,1fr) 15rem}.lg\:items-start{align-items:flex-start}.lg\:items-stretch{align-items:stretch}.lg\:justify-between{justify-content:space-between}.lg\:p-5{padding:calc(var(--spacing) * 5)}}.\[\&_svg\]\:pointer-events-none svg{pointer-events:none}.\[\&_svg\]\:shrink-0 svg{flex-shrink:0}.\[\&\>span\]\:line-clamp-1>span{-webkit-line-clamp:1;-webkit-box-orient:vertical;display:-webkit-box;overflow:hidden}.page-fade{animation:.28s cubic-bezier(.22,1,.36,1) page-fade}.studio-panel{background:#fffcf7}@supports (color:color-mix(in lab,red,red)){.studio-panel{background:color-mix(in oklab,var(--color-paper) 94%,white)}}.studio-panel{box-shadow:0 12px 32px -30px #2b201a1f}@supports (color:color-mix(in lab,red,red)){.studio-panel{box-shadow:0 12px 32px -30px color-mix(in oklab,var(--color-ink) 12%,transparent)}}.studio-grid{background:radial-gradient(circle at top,#fbf3e761,#0000 62%),linear-gradient(#fffcf7,#fdfbf6)}@supports (color:color-mix(in lab,red,red)){.studio-grid{background:radial-gradient(circle at top,color-mix(in oklab,var(--color-oat) 38%,transparent),transparent 62%),linear-gradient(180deg,color-mix(in oklab,var(--color-paper) 97%,white),color-mix(in oklab,var(--color-oat) 36%,white))}}.soft-scroll{scrollbar-width:thin;scrollbar-color:#cec6bca6 transparent}@supports (color:color-mix(in lab,red,red)){.soft-scroll{scrollbar-color:color-mix(in oklab,var(--color-stone) 65%,transparent) transparent}}.page-halo{background:radial-gradient(circle at top,#fbf3e7c7,#0000 66%),linear-gradient(#fffcf7f5,#0000 80%)}@supports (color:color-mix(in lab,red,red)){.page-halo{background:radial-gradient(circle at top,color-mix(in oklab,var(--color-oat) 78%,transparent),transparent 66%),linear-gradient(180deg,color-mix(in oklab,var(--color-paper) 96%,transparent),transparent 80%)}}}@keyframes accordion-down{0%{height:0}to{height:var(--radix-accordion-content-height)}}@keyframes accordion-up{0%{height:var(--radix-accordion-content-height)}to{height:0}}@keyframes page-fade{0%{opacity:0;transform:translateY(12px)}to{opacity:1;transform:translateY(0)}}@keyframes section-rise{0%{opacity:0;transform:translateY(18px)}to{opacity:1;transform:translateY(0)}}@property --tw-scale-x{syntax:"*";inherits:false;initial-value:1}@property --tw-scale-y{syntax:"*";inherits:false;initial-value:1}@property --tw-scale-z{syntax:"*";inherits:false;initial-value:1}@property --tw-rotate-x{syntax:"*";inherits:false}@property --tw-rotate-y{syntax:"*";inherits:false}@property --tw-rotate-z{syntax:"*";inherits:false}@property --tw-skew-x{syntax:"*";inherits:false}@property --tw-skew-y{syntax:"*";inherits:false}@property --tw-pan-x{syntax:"*";inherits:false}@property --tw-pan-y{syntax:"*";inherits:false}@property --tw-pinch-zoom{syntax:"*";inherits:false}@property --tw-space-y-reverse{syntax:"*";inherits:false;initial-value:0}@property --tw-space-x-reverse{syntax:"*";inherits:false;initial-value:0}@property --tw-divide-x-reverse{syntax:"*";inherits:false;initial-value:0}@property --tw-border-style{syntax:"*";inherits:false;initial-value:solid}@property --tw-divide-y-reverse{syntax:"*";inherits:false;initial-value:0}@property --tw-leading{syntax:"*";inherits:false}@property --tw-font-weight{syntax:"*";inherits:false}@property --tw-tracking{syntax:"*";inherits:false}@property --tw-ordinal{syntax:"*";inherits:false}@property --tw-slashed-zero{syntax:"*";inherits:false}@property --tw-numeric-figure{syntax:"*";inherits:false}@property --tw-numeric-spacing{syntax:"*";inherits:false}@property --tw-numeric-fraction{syntax:"*";inherits:false}@property --tw-shadow{syntax:"*";inherits:false;initial-value:0 0 #0000}@property --tw-shadow-color{syntax:"*";inherits:false}@property --tw-shadow-alpha{syntax:"<percentage>";inherits:false;initial-value:100%}@property --tw-inset-shadow{syntax:"*";inherits:false;initial-value:0 0 #0000}@property --tw-inset-shadow-color{syntax:"*";inherits:false}@property --tw-inset-shadow-alpha{syntax:"<percentage>";inherits:false;initial-value:100%}@property --tw-ring-color{syntax:"*";inherits:false}@property --tw-ring-shadow{syntax:"*";inherits:false;initial-value:0 0 #0000}@property --tw-inset-ring-color{syntax:"*";inherits:false}@property --tw-inset-ring-shadow{syntax:"*";inherits:false;initial-value:0 0 #0000}@property --tw-ring-inset{syntax:"*";inherits:false}@property --tw-ring-offset-width{syntax:"<length>";inherits:false;initial-value:0}@property --tw-ring-offset-color{syntax:"*";inherits:false;initial-value:#fff}@property --tw-ring-offset-shadow{syntax:"*";inherits:false;initial-value:0 0 #0000}@property --tw-outline-style{syntax:"*";inherits:false;initial-value:solid}@property --tw-blur{syntax:"*";inherits:false}@property --tw-brightness{syntax:"*";inherits:false}@property --tw-contrast{syntax:"*";inherits:false}@property --tw-grayscale{syntax:"*";inherits:false}@property --tw-hue-rotate{syntax:"*";inherits:false}@property --tw-invert{syntax:"*";inherits:false}@property --tw-opacity{syntax:"*";inherits:false}@property --tw-saturate{syntax:"*";inherits:false}@property --tw-sepia{syntax:"*";inherits:false}@property --tw-drop-shadow{syntax:"*";inherits:false}@property --tw-drop-shadow-color{syntax:"*";inherits:false}@property --tw-drop-shadow-alpha{syntax:"<percentage>";inherits:false;initial-value:100%}@property --tw-drop-shadow-size{syntax:"*";inherits:false}@property --tw-backdrop-blur{syntax:"*";inherits:false}@property --tw-backdrop-brightness{syntax:"*";inherits:false}@property --tw-backdrop-contrast{syntax:"*";inherits:false}@property --tw-backdrop-grayscale{syntax:"*";inherits:false}@property --tw-backdrop-hue-rotate{syntax:"*";inherits:false}@property --tw-backdrop-invert{syntax:"*";inherits:false}@property --tw-backdrop-opacity{syntax:"*";inherits:false}@property --tw-backdrop-saturate{syntax:"*";inherits:false}@property --tw-backdrop-sepia{syntax:"*";inherits:false}@property --tw-duration{syntax:"*";inherits:false}@property --tw-ease{syntax:"*";inherits:false}@property --tw-translate-x{syntax:"*";inherits:false;initial-value:0}@property --tw-translate-y{syntax:"*";inherits:false;initial-value:0}@property --tw-translate-z{syntax:"*";inherits:false;initial-value:0}@keyframes spin{to{transform:rotate(360deg)}}
frontend/dist/index.html ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!doctype html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8" />
5
+ <link rel="icon" type="image/svg+xml" href="/vite.svg" />
6
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
7
+ <title>LightDiffusion Next</title>
8
+ <script type="module" crossorigin src="/assets/index-7kNA4Hm-.js"></script>
9
+ <link rel="stylesheet" crossorigin href="/assets/index-CAwyaxYh.css">
10
+ </head>
11
+ <body>
12
+ <div id="root"></div>
13
+ </body>
14
+ </html>
frontend/dist/vite.svg ADDED
frontend/eslint.config.js ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import js from '@eslint/js'
2
+ import globals from 'globals'
3
+ import reactHooks from 'eslint-plugin-react-hooks'
4
+ import reactRefresh from 'eslint-plugin-react-refresh'
5
+ import tseslint from 'typescript-eslint'
6
+ import { defineConfig, globalIgnores } from 'eslint/config'
7
+
8
+ export default defineConfig([
9
+ globalIgnores(['dist']),
10
+ {
11
+ files: ['**/*.{ts,tsx}'],
12
+ extends: [
13
+ js.configs.recommended,
14
+ tseslint.configs.recommended,
15
+ reactHooks.configs.flat.recommended,
16
+ reactRefresh.configs.vite,
17
+ ],
18
+ languageOptions: {
19
+ ecmaVersion: 2020,
20
+ globals: globals.browser,
21
+ },
22
+ },
23
+ ])
frontend/index.html ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!doctype html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8" />
5
+ <link rel="icon" type="image/svg+xml" href="/vite.svg" />
6
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
7
+ <title>LightDiffusion Next</title>
8
+ </head>
9
+ <body>
10
+ <div id="root"></div>
11
+ <script type="module" src="/src/main.tsx"></script>
12
+ </body>
13
+ </html>
frontend/package-lock.json ADDED
The diff for this file is too large to render. See raw diff
 
frontend/package.json ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "frontend",
3
+ "private": true,
4
+ "version": "0.0.0",
5
+ "type": "module",
6
+ "scripts": {
7
+ "dev": "vite",
8
+ "build": "tsc -b && vite build",
9
+ "lint": "eslint .",
10
+ "preview": "vite preview"
11
+ },
12
+ "dependencies": {
13
+ "@radix-ui/react-accordion": "^1.2.12",
14
+ "@radix-ui/react-collapsible": "^1.1.12",
15
+ "@radix-ui/react-dialog": "^1.1.15",
16
+ "@radix-ui/react-label": "^2.1.8",
17
+ "@radix-ui/react-scroll-area": "^1.2.10",
18
+ "@radix-ui/react-select": "^2.2.6",
19
+ "@radix-ui/react-separator": "^1.1.8",
20
+ "@radix-ui/react-slot": "^1.2.4",
21
+ "@radix-ui/react-switch": "^1.2.6",
22
+ "axios": "^1.13.4",
23
+ "class-variance-authority": "^0.7.1",
24
+ "clsx": "^2.1.1",
25
+ "lucide-react": "^1.8.0",
26
+ "react": "^19.2.0",
27
+ "react-dom": "^19.2.0",
28
+ "react-dropzone": "^14.4.0",
29
+ "react-use-websocket": "^4.13.0",
30
+ "tailwind-merge": "^3.5.0",
31
+ "zustand": "^5.0.11"
32
+ },
33
+ "devDependencies": {
34
+ "@eslint/js": "^9.39.1",
35
+ "@tailwindcss/vite": "^4.2.2",
36
+ "@types/node": "^24.10.1",
37
+ "@types/react": "^19.2.5",
38
+ "@types/react-dom": "^19.2.3",
39
+ "@vitejs/plugin-react": "^5.1.1",
40
+ "eslint": "^9.39.1",
41
+ "eslint-plugin-react-hooks": "^7.0.1",
42
+ "eslint-plugin-react-refresh": "^0.4.24",
43
+ "globals": "^16.5.0",
44
+ "tailwindcss": "^4.2.2",
45
+ "typescript": "~5.9.3",
46
+ "typescript-eslint": "^8.46.4",
47
+ "vite": "^7.2.4"
48
+ }
49
+ }
frontend/public/vite.svg ADDED
frontend/src/App.tsx ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { useState } from 'react';
2
+ import { GenerationComposer } from './components/GenerationComposer';
3
+ import { GenerationSettings } from './components/GenerationSettings';
4
+ import { Gallery } from './components/Gallery';
5
+ import { ImagePreview } from './components/ImagePreview';
6
+ import {
7
+ Sheet,
8
+ SheetContent,
9
+ SheetDescription,
10
+ SheetHeader,
11
+ SheetTitle,
12
+ } from './components/ui/sheet';
13
+ import { useGenerationBootstrap } from './hooks/use-generation-bootstrap';
14
+ import { useMediaQuery } from './hooks/use-media-query';
15
+
16
+ export default function App() {
17
+ useGenerationBootstrap();
18
+
19
+ const [controlsOpen, setControlsOpen] = useState(false);
20
+ const isDesktop = useMediaQuery('(min-width: 1024px)');
21
+ const controlSide = isDesktop ? 'right' : 'bottom';
22
+
23
+ return (
24
+ <div className="min-h-screen bg-canvas text-ink">
25
+ <div className="page-halo pointer-events-none absolute inset-x-0 top-0 h-96" />
26
+
27
+ <main className="page-fade relative mx-auto flex min-h-screen w-full max-w-[1320px] flex-col px-4 pb-10 pt-4 sm:px-6">
28
+ <section className="mx-auto min-h-0 w-full max-w-[1200px] space-y-4 sm:space-y-5">
29
+ <GenerationComposer onOpenAdvanced={() => setControlsOpen(true)} />
30
+ <ImagePreview />
31
+ <Gallery />
32
+ </section>
33
+ </main>
34
+
35
+ <Sheet open={controlsOpen} onOpenChange={setControlsOpen}>
36
+ <SheetContent
37
+ side={controlSide}
38
+ className={
39
+ isDesktop
40
+ ? 'h-[calc(100vh-2rem)] w-[26rem] overflow-hidden sm:max-w-none'
41
+ : 'h-[min(88vh,860px)] overflow-hidden'
42
+ }
43
+ >
44
+ <SheetHeader>
45
+ <SheetTitle>Advanced controls</SheetTitle>
46
+ <SheetDescription>
47
+ Sampling, conditioning, optimization, and history for the next run.
48
+ </SheetDescription>
49
+ </SheetHeader>
50
+ <div className="mt-4 h-[calc(100%-4rem)] min-h-0">
51
+ <GenerationSettings />
52
+ </div>
53
+ </SheetContent>
54
+ </Sheet>
55
+ </div>
56
+ );
57
+ }
frontend/src/api/client.ts ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import axios from 'axios';
2
+ import type {
3
+ GenerationSettings,
4
+ GenerationResponse,
5
+ ImageMetadata,
6
+ ModelInfo,
7
+ SettingsPreferences,
8
+ SettingsSnapshot,
9
+ } from '../types';
10
+
11
+ const api = axios.create({
12
+ baseURL: '/api', // Proxy handles redirection to localhost:7861
13
+ });
14
+
15
+ export const listModels = async (): Promise<ModelInfo[]> => {
16
+ const res = await api.get<ModelInfo[]>('/models');
17
+ return res.data;
18
+ };
19
+
20
+ export const listControlNets = async (): Promise<{ models: string[] }> => {
21
+ const res = await api.get<{ models: string[] }>('/controlnets');
22
+ return res.data;
23
+ };
24
+
25
+ export const generateImage = async (settings: GenerationSettings): Promise<GenerationResponse> => {
26
+ const res = await api.post<GenerationResponse>('/generate', settings);
27
+ console.log("Generation response:", res.data);
28
+ return res.data;
29
+ };
30
+
31
+ export const interruptGeneration = async (): Promise<void> => {
32
+ await api.post('/interrupt');
33
+ };
34
+
35
+ export const getLastSeed = async (): Promise<{ seed: number | null }> => {
36
+ const res = await api.get('/settings/last');
37
+ return res.data;
38
+ };
39
+
40
+ export const getSettingsHistory = async (): Promise<{ history: SettingsSnapshot[] }> => {
41
+ const res = await api.get('/settings/history');
42
+ return res.data;
43
+ };
44
+
45
+ export const getSettingsPreferences = async (): Promise<SettingsPreferences> => {
46
+ const res = await api.get('/settings/preferences');
47
+ return res.data;
48
+ };
49
+
50
+ export const postSettingsPreferences = async (preferences: SettingsPreferences): Promise<SettingsPreferences> => {
51
+ const res = await api.post('/settings/preferences', preferences);
52
+ return res.data;
53
+ };
54
+
55
+ export const postSettingsSnapshot = async (settings: GenerationSettings, include_prompt: boolean = false): Promise<{ snapshot: SettingsSnapshot }> => {
56
+ const res = await api.post('/settings/history', { settings, include_prompt });
57
+ return res.data;
58
+ };
59
+
60
+ export const getImageMetadata = async (imageB64: string): Promise<{ metadata: ImageMetadata }> => {
61
+ const res = await api.post('/images/metadata', { image: imageB64 });
62
+ return res.data;
63
+ };
64
+
65
+ export const getTelemetry = async (): Promise<Record<string, unknown>> => {
66
+ const res = await api.get('/telemetry');
67
+ return res.data;
68
+ }
69
+
70
+ export default api;
frontend/src/assets/react.svg ADDED
frontend/src/components/Gallery.tsx ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { ScrollArea, ScrollBar } from './ui/scroll-area';
2
+ import { useStore } from '../store/useStore';
3
+ import { cn } from '../lib/utils';
4
+ import { useShallow } from 'zustand/react/shallow';
5
+
6
+ export function Gallery() {
7
+ const { currentImage, gallery, setCurrentImage } = useStore(useShallow((state) => ({
8
+ currentImage: state.currentImage,
9
+ gallery: state.gallery,
10
+ setCurrentImage: state.setCurrentImage,
11
+ })));
12
+
13
+ return (
14
+ <section className="-mt-2 overflow-hidden rounded-b-[2rem] border border-line border-t-0 bg-paper/62 px-4 pb-3 pt-2 sm:px-5">
15
+ <div className="flex items-center justify-between gap-3">
16
+ <h2 className="font-serif text-[1.05rem] tracking-[-0.025em] text-ink">Recent</h2>
17
+ <p className="text-xs text-muted">
18
+ {gallery.length === 0 ? 'No saved frames yet' : `${gallery.length} saved`}
19
+ </p>
20
+ </div>
21
+
22
+ {gallery.length === 0 ? (
23
+ <div className="mt-4 rounded-[1.4rem] border border-dashed border-line bg-oat/45 px-4 py-6 text-sm text-muted">
24
+ Generated images will collect here for quick comparison.
25
+ </div>
26
+ ) : (
27
+ <ScrollArea className="mt-2.5 w-full whitespace-nowrap">
28
+ <div className="flex gap-2 pb-2">
29
+ {gallery.map((image, index) => {
30
+ const isSelected = image === currentImage;
31
+
32
+ return (
33
+ <button
34
+ key={`${index}-${image.slice(0, 28)}`}
35
+ type="button"
36
+ onClick={() => setCurrentImage(image)}
37
+ className={cn(
38
+ 'group relative w-[4.25rem] shrink-0 overflow-hidden rounded-[1rem] border bg-paper text-left transition sm:w-[4.9rem]',
39
+ isSelected
40
+ ? 'border-clay shadow-[0_10px_20px_-18px_color-mix(in_oklab,var(--color-clay)_28%,transparent)]'
41
+ : 'border-line hover:-translate-y-0.5 hover:border-clay/35',
42
+ )}
43
+ aria-label={`Open image ${index + 1}`}
44
+ >
45
+ <img
46
+ src={image}
47
+ alt={`Generated frame ${index + 1}`}
48
+ loading="lazy"
49
+ decoding="async"
50
+ className="h-[4.25rem] w-full object-cover sm:h-[4.9rem]"
51
+ />
52
+ {isSelected ? <div className="absolute right-3 top-3 h-2.5 w-2.5 rounded-full bg-clay" /> : null}
53
+ </button>
54
+ );
55
+ })}
56
+ </div>
57
+ <ScrollBar orientation="horizontal" />
58
+ </ScrollArea>
59
+ )}
60
+ </section>
61
+ );
62
+ }