Spaces:

Aatricks
/

LightDiffusion-Next

Running on Zero

App Files Files Community

Aatricks commited on 21 days ago

Commit

b701455

0 Parent(s):

Deploy ZeroGPU Gradio Space snapshot

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.dockerignore +97 -0
.gitattributes +2 -0
.github/instructions/memory.instruction.md +46 -0
.github/workflows/ci.yml +73 -0
.gitignore +14 -0
.python-version +1 -0
Dockerfile +218 -0
LICENSE +674 -0
README.md +297 -0
THIRD_PARTY_LICENSES.md +948 -0
app.py +367 -0
docker-compose.yml +41 -0
docker/README.md +46 -0
docker/patch_sageattention.py +49 -0
docker/sageattention_setup.patch +24 -0
docs/advanced-cfg-optimizations.md +262 -0
docs/api.md +152 -0
docs/architecture.md +73 -0
docs/ays-scheduler.md +150 -0
docs/cfg-free-sampling.md +269 -0
docs/contributing.md +94 -0
docs/examples.md +143 -0
docs/faq.md +68 -0
docs/implemented-optimizations-report.md +484 -0
docs/index.md +44 -0
docs/installation.md +161 -0
docs/optimizations.md +262 -0
docs/prompt-caching.md +64 -0
docs/quirks.md +60 -0
docs/rocm-metal-support.md +360 -0
docs/sageattention.md +338 -0
docs/stablefast.md +412 -0
docs/tome.md +272 -0
docs/usage.md +134 -0
docs/wavespeed.md +473 -0
download_flux.py +21 -0
frontend/README.md +73 -0
frontend/dist/assets/index-7kNA4Hm-.js +0 -0
frontend/dist/assets/index-CAwyaxYh.css +1 -0
frontend/dist/index.html +14 -0
frontend/dist/vite.svg +1 -0
frontend/eslint.config.js +23 -0
frontend/index.html +13 -0
frontend/package-lock.json +0 -0
frontend/package.json +49 -0
frontend/public/vite.svg +1 -0
frontend/src/App.tsx +57 -0
frontend/src/api/client.ts +70 -0
frontend/src/assets/react.svg +1 -0
frontend/src/components/Gallery.tsx +62 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,97 @@

+# Python cache files
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# Exception: Keep SageAttention and SpargeAttn build directories for Docker
+!SageAttention/
+!SpargeAttn/
+!docker/
+# But exclude their build artifacts
+SageAttention/build/
+SageAttention/*.egg-info/
+SageAttention/**/__pycache__/
+SpargeAttn/build/
+SpargeAttn/*.egg-info/
+SpargeAttn/**/__pycache__/
+# Virtual environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# IDE files
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# OS files
+.DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db
+# Git
+.git/
+.gitignore
+# Docker files (not needed in the image, but Dockerfile itself is needed for the build context)
+.dockerignore
+# Documentation (not needed in runtime, but docker/ scripts are needed for build)
+*.md
+!docker/
+docs/
+!frontend/dist/
+!frontend/dist/**
+# Large model files (these should be downloaded at runtime)
+*.safetensors
+*.ckpt
+*.pt
+*.pth
+*.bin
+*.gguf
+# Logs
+*.log
+logs/
+# Temporary files
+tmp/
+temp/
+*.tmp
+# Generated images (these will be created at runtime)
+output/
+# Large dependencies that will be installed via pip
+stable_fast-*.whl

.gitattributes ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # Auto detect text files and perform LF normalization
2	+ * text=auto

.github/instructions/memory.instruction.md ADDED Viewed

	@@ -0,0 +1,46 @@

+---
+applyTo: '**'
+---
+# User Memory
+## User Preferences
+- Programming languages:
+- Code style preferences:
+- Development environment:
+- Communication style:
+## Project Context
+- Current project type:
+- Tech stack:
+- Architecture patterns:
+- Key requirements:
+## Coding Patterns
+- Preferred patterns and practices
+- Code organization preferences
+- Testing approaches
+- Documentation style
+## Context7 Research History
+- Libraries researched on Context7
+- Best practices discovered
+- Implementation patterns used
+- Version-specific findings
+- 2026-02-11: Searched Context7 for pytest; no libraries found. Reviewed Context7 MCP docs (all-clients, adding-libraries, troubleshooting, api-guide, developer guide) to satisfy research requirements for this task.
+## Conversation History
+- 2026-02-11: Requested DifferentialDiffusion class excerpt with line numbers from src/AutoDetailer/ADetailer.py.
+- 2026-02-11: Fixing ADetailer SDXL mask behavior by applying denoise_mask blending in KSamplerX0Inpaint; will add tests and validate with manual image generation.
+- 2026-02-11: Added denoise_mask resizing to latent resolution to avoid shape mismatch; generated SDXL baseline and ADetailer outputs for manual verification.
+- 2026-02-11: Normalized ADetailer noise masks to [0,1], aligned SDXL crop conditioning to crop-local sizes, and added unit tests plus manual SDXL ADetailer generation and image stats verification.
+- 2026-02-11: Began implementation of mask-aware regression test for ADetailer SDXL noise masking.
+- 2026-02-11: Added deterministic unit test that stubs sampling and verifies noise is localized to resized mask region in enhance_detail.
+- Important decisions made
+- Recurring questions or topics
+- Solutions that worked well
+- Things to avoid or that didn't work
+## Notes
+- 2026-02-11: pytest -q tests/unit/test_adetailer_noise_mask.py passed (4 tests).

.github/workflows/ci.yml ADDED Viewed

	@@ -0,0 +1,73 @@

+name: CI
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ['3.10', '3.14']
+      fail-fast: false
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Cache pip dependencies
+        uses: actions/cache@v4
+        with:
+          path: ~/.cache/pip
+          key: ${{ runner.os }}-pip-${{ matrix.python-version }}-${{ hashFiles('requirements.txt') }}
+          restore-keys: |
+            ${{ runner.os }}-pip-${{ matrix.python-version }}-
+            ${{ runner.os }}-pip-
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
+          pip install "numpy<2.0.0"
+          pip install pytest pytest-cov
+          pip install -r requirements.txt
+      - name: Run tests
+        run: |
+          # Run tests file-by-file for isolation to prevent mock leaks between suites.
+          # We handle exit code 5 (no tests collected) which happens when a file
+          # only contains 'slow' or 'gpu' tests that are filtered out.
+          failed=0
+          for f in $(find tests -type f -name "test_*.py"); do
+            echo "Running tests in $f..."
+            if pytest -v -m "not gpu and not slow" --tb=short "$f"; then
+              echo "Successfully ran tests in $f"
+            else
+              status=$?
+              if [ $status -eq 5 ]; then
+                echo "No tests matching filter in $f (exit code 5), continuing..."
+              else
+                echo "Error: tests in $f failed with exit code $status"
+                failed=1
+              fi
+            fi
+          done
+          if [ $failed -ne 0 ]; then
+            echo "One or more test suites failed."
+            exit 1
+          fi
+      - name: Upload coverage report
+        if: matrix.python-version == '3.10'
+        uses: actions/upload-artifact@v4
+        with:
+          name: coverage-report
+          path: htmlcov/
+          if-no-files-found: ignore

.gitignore ADDED Viewed

	@@ -0,0 +1,14 @@

+*.pyc
+*.pth
+*.pt
+*.safetensors
+*.png
+stable_fast-*.whl
+.venv
+node_modules/
+frontend/node_modules/
+*.log
+.history_backups
+include/last_seed.txt
+include/settings_store.json
+docs/ai/

.python-version ADDED Viewed

	@@ -0,0 +1 @@


1	+ 3.12

Dockerfile ADDED Viewed

	@@ -0,0 +1,218 @@

+FROM node:22-bookworm-slim AS frontend-builder
+WORKDIR /frontend
+COPY frontend/package.json frontend/package-lock.json ./
+RUN npm ci
+COPY frontend/ ./
+RUN npm run build
+FROM nvidia/cuda:12.8.0-devel-ubuntu22.04
+ENV DEBIAN_FRONTEND=noninteractive
+ENV PYTHONUNBUFFERED=1
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV CUDA_HOME=/usr/local/cuda
+ENV PATH=${CUDA_HOME}/bin:${PATH}
+ENV LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}
+ENV TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0;12.0"
+RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
+    --mount=type=cache,target=/var/lib/apt,sharing=locked \
+    apt-get update && apt-get install -y \
+    python3.10 \
+    python3.10-dev \
+    python3.10-venv \
+    python3-pip \
+    python3-tk \
+    git \
+    wget \
+    curl \
+    build-essential \
+    libgl1-mesa-glx \
+    libglib2.0-0 \
+    libsm6 \
+    libxext6 \
+    libxrender-dev \
+    libgomp1 \
+    software-properties-common \
+    ninja-build \
+    && rm -rf /var/lib/apt/lists/*
+RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1
+WORKDIR /app
+COPY requirements.txt ./
+RUN --mount=type=cache,target=/root/.cache/pip python3 -m pip install --upgrade pip
+RUN --mount=type=cache,target=/root/.cache/pip python3 -m pip install uv
+RUN --mount=type=cache,target=/root/.cache/uv /bin/sh -c 'set -e; \
+    python3 -m uv pip install --system --index-url https://download.pytorch.org/whl/cu128 \
+        torch torchvision "triton>=2.1.0"; \
+    if echo "${TORCH_CUDA_ARCH_LIST}" | grep -q "12\.0"; then \
+        echo "Detected compute capability 12.0 (RTX 50 series). Skipping xformers install."; \
+    else \
+        python3 -m uv pip install --system xformers; \
+    fi'
+RUN --mount=type=cache,target=/root/.cache/uv python3 -m uv pip install --system "numpy<2.0.0"
+RUN --mount=type=cache,target=/root/.cache/uv python3 -m uv pip install --system -r requirements.txt
+ARG TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0;12.0"
+ENV TORCH_CUDA_ARCH_LIST=${TORCH_CUDA_ARCH_LIST}
+ARG INSTALL_STABLE_FAST=0
+ENV INSTALL_STABLE_FAST=${INSTALL_STABLE_FAST}
+ARG INSTALL_OLLAMA=0
+ENV INSTALL_OLLAMA=${INSTALL_OLLAMA}
+ARG INSTALL_SAGEATTENTION=0
+ENV INSTALL_SAGEATTENTION=${INSTALL_SAGEATTENTION}
+ARG INSTALL_SPARGEATTN=0
+ENV INSTALL_SPARGEATTN=${INSTALL_SPARGEATTN}
+RUN --mount=type=cache,target=/root/.cache/pip \
+    --mount=type=cache,target=/build-cache/stablefast,sharing=locked /bin/sh -c ' \
+    if [ "${INSTALL_STABLE_FAST}" = "1" ]; then \
+        echo "Installing stable-fast for CUDA architectures: ${TORCH_CUDA_ARCH_LIST}"; \
+        export TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST}"; \
+        export FORCE_CUDA=1; \
+        mkdir -p /build-cache/stablefast; \
+        python3 -m pip wheel --no-build-isolation --wheel-dir /build-cache/stablefast \
+            git+https://github.com/chengzeyi/stable-fast.git@main#egg=stable-fast; \
+        python3 -m pip install --no-build-isolation --no-index --find-links /build-cache/stablefast stable-fast; \
+    else \
+        echo "Skipping stable-fast installation (INSTALL_STABLE_FAST=${INSTALL_STABLE_FAST})"; \
+    fi'
+RUN --mount=type=cache,target=/build-cache/ollama,sharing=locked /bin/sh -c ' \
+    if [ "${INSTALL_OLLAMA}" = "1" ]; then \
+        echo "Installing Ollama and pulling qwen3:0.6b"; \
+        mkdir -p /build-cache/ollama; \
+        curl -fsSL https://ollama.com/install.sh -o /build-cache/ollama/install.sh; \
+        sh /build-cache/ollama/install.sh; \
+        export OLLAMA_HOME=/build-cache/ollama; \
+        ollama serve >/tmp/ollama.log 2>&1 & \
+        OLLAMA_PID=$!; \
+        attempts=0; \
+        until curl -fsS http://127.0.0.1:11434/api/version >/dev/null 2>&1; do \
+            attempts=$((attempts + 1)); \
+            if [ ${attempts} -gt 20 ]; then \
+                echo "Ollama failed to start"; \
+                kill ${OLLAMA_PID} >/dev/null 2>&1 || true; \
+                exit 1; \
+            fi; \
+            sleep 1; \
+        done; \
+        ollama pull qwen3:0.6b; \
+        kill ${OLLAMA_PID} >/dev/null 2>&1 || true; \
+        wait ${OLLAMA_PID} 2>/dev/null || true; \
+    else \
+        echo "Skipping Ollama installation (INSTALL_OLLAMA=${INSTALL_OLLAMA})"; \
+    fi'
+COPY . .
+COPY --from=frontend-builder /frontend/dist ./frontend/dist
+RUN --mount=type=cache,target=/root/.cache/torch_extensions,sharing=locked \
+    --mount=type=cache,target=/build-cache/sageattention,sharing=locked /bin/sh -c ' \
+    if [ "${INSTALL_SAGEATTENTION}" = "1" ]; then \
+        if [ -d "SageAttention" ]; then \
+            echo "Found SageAttention - applying patch"; \
+            cd SageAttention; \
+            python3 ../docker/patch_sageattention.py; \
+            python3 -m pip wheel --no-build-isolation --wheel-dir /build-cache/sageattention .; \
+            python3 -m pip install --no-index /build-cache/sageattention/*.whl; \
+            cd ..; \
+            rm -rf SageAttention/build SageAttention/*.egg-info; \
+        else \
+            echo "SageAttention directory not found - cloning and applying patch"; \
+            git clone --depth 1 https://github.com/thu-ml/SageAttention /tmp/SageAttention; \
+            cd /tmp/SageAttention; \
+            python3 /app/docker/patch_sageattention.py; \
+            python3 -m pip wheel --no-build-isolation --wheel-dir /build-cache/sageattention .; \
+            python3 -m pip install --no-index /build-cache/sageattention/*.whl; \
+            rm -rf /tmp/SageAttention/build /tmp/SageAttention/*.egg-info; \
+            rm -rf /tmp/SageAttention; \
+        fi; \
+    else \
+        echo "Skipping SageAttention installation (INSTALL_SAGEATTENTION=${INSTALL_SAGEATTENTION})"; \
+    fi'
+RUN --mount=type=cache,target=/root/.cache/torch_extensions,sharing=locked \
+    --mount=type=cache,target=/build-cache/spargeattn,sharing=locked /bin/sh -c ' \
+    if [ "${INSTALL_SPARGEATTN}" = "1" ]; then \
+        if [ -d "SpargeAttn" ]; then \
+            cd SpargeAttn; \
+            if echo "${TORCH_CUDA_ARCH_LIST}" | grep -qE "(8\.0|8\.6|8\.7|8\.9|9\.0)"; then \
+                echo "Building SpargeAttn for supported architectures: ${TORCH_CUDA_ARCH_LIST}"; \
+                python3 -m pip wheel --no-build-isolation --wheel-dir /build-cache/spargeattn .; \
+                python3 -m pip install --no-index /build-cache/spargeattn/*.whl; \
+                rm -rf build *.egg-info; \
+            else \
+                echo "Skipping SpargeAttn - architecture ${TORCH_CUDA_ARCH_LIST} not supported (requires 8.0-9.0)"; \
+            fi; \
+            cd ..; \
+        else \
+            echo "SpargeAttn directory not found - cloning and attempting build if supported"; \
+            git clone --depth 1 https://github.com/thu-ml/SpargeAttn /tmp/SpargeAttn; \
+            cd /tmp/SpargeAttn; \
+            if echo "${TORCH_CUDA_ARCH_LIST}" | grep -qE "(8\.0|8\.6|8\.7|8\.9|9\.0)"; then \
+                echo "Building cloned SpargeAttn for supported architectures: ${TORCH_CUDA_ARCH_LIST}"; \
+                python3 -m pip wheel --no-build-isolation --wheel-dir /build-cache/spargeattn .; \
+                python3 -m pip install --no-index /build-cache/spargeattn/*.whl; \
+                rm -rf build *.egg-info; \
+            else \
+                echo "Skipping cloned SpargeAttn - architecture ${TORCH_CUDA_ARCH_LIST} not supported (requires 8.0-9.0)"; \
+            fi; \
+            cd /app; \
+            rm -rf /tmp/SpargeAttn; \
+        fi; \
+    else \
+        echo "Skipping SpargeAttn installation (INSTALL_SPARGEATTN=${INSTALL_SPARGEATTN})"; \
+    fi'
+RUN mkdir -p ./output/classic \
+    ./output/Flux \
+    ./output/HiresFix \
+    ./output/Img2Img \
+    ./output/Adetailer \
+    ./include/checkpoints \
+    ./include/clip \
+    ./include/embeddings \
+    ./include/ESRGAN \
+    ./include/loras \
+    ./include/sd1_tokenizer \
+    ./include/text_encoder \
+    ./include/unet \
+    ./include/vae \
+    ./include/vae_approx \
+    ./include/yolos
+RUN echo "42" > ./include/last_seed.txt
+RUN echo "A beautiful landscape" > ./include/prompt.txt
+EXPOSE 7860
+ENV PORT=7860
+HEALTHCHECK --interval=30s --timeout=30s --start-period=60s --retries=3 \
+    CMD curl -f http://localhost:${PORT}/health || exit 1
+CMD if [ "${INSTALL_OLLAMA}" = "1" ]; then \
+        echo "Starting Ollama server"; \
+        ollama serve >/tmp/ollama_runtime.log 2>&1 & \
+        for attempt in $(seq 1 20); do \
+            if curl -fsS http://127.0.0.1:11434/api/version >/dev/null 2>&1; then \
+                break; \
+            fi; \
+            sleep 1; \
+        done; \
+    fi; \
+    exec python3 server.py --host 0.0.0.0 --port "${PORT}"

LICENSE ADDED Viewed

	@@ -0,0 +1,674 @@

+                    GNU GENERAL PUBLIC LICENSE
+                       Version 3, 29 June 2007
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+                            Preamble
+  The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+  The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works.  By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users.  We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors.  You can apply it to
+your programs, too.
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+  To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights.  Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received.  You must make sure that they, too, receive
+or can get the source code.  And you must show them these terms so they
+know their rights.
+  Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+  For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software.  For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+  Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so.  This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software.  The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable.  Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products.  If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+  Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary.  To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+  The precise terms and conditions for copying, distribution and
+modification follow.
+                       TERMS AND CONDITIONS
+  0. Definitions.
+  "This License" refers to version 3 of the GNU General Public License.
+  "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+  "The Program" refers to any copyrightable work licensed under this
+License.  Each licensee is addressed as "you".  "Licensees" and
+"recipients" may be individuals or organizations.
+  To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy.  The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+  A "covered work" means either the unmodified Program or a work based
+on the Program.
+  To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy.  Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+  To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies.  Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+  An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License.  If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+  1. Source Code.
+  The "source code" for a work means the preferred form of the work
+for making modifications to it.  "Object code" means any non-source
+form of a work.
+  A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+  The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form.  A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+  The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities.  However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work.  For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+  The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+  The Corresponding Source for a work in source code form is that
+same work.
+  2. Basic Permissions.
+  All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met.  This License explicitly affirms your unlimited
+permission to run the unmodified Program.  The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work.  This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+  You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force.  You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright.  Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+  Conveying under any other circumstances is permitted solely under
+the conditions stated below.  Sublicensing is not allowed; section 10
+makes it unnecessary.
+  3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+  No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+  When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+  4. Conveying Verbatim Copies.
+  You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+  You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+  5. Conveying Modified Source Versions.
+  You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+    a) The work must carry prominent notices stating that you modified
+    it, and giving a relevant date.
+    b) The work must carry prominent notices stating that it is
+    released under this License and any conditions added under section
+    7.  This requirement modifies the requirement in section 4 to
+    "keep intact all notices".
+    c) You must license the entire work, as a whole, under this
+    License to anyone who comes into possession of a copy.  This
+    License will therefore apply, along with any applicable section 7
+    additional terms, to the whole of the work, and all its parts,
+    regardless of how they are packaged.  This License gives no
+    permission to license the work in any other way, but it does not
+    invalidate such permission if you have separately received it.
+    d) If the work has interactive user interfaces, each must display
+    Appropriate Legal Notices; however, if the Program has interactive
+    interfaces that do not display Appropriate Legal Notices, your
+    work need not make them do so.
+  A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit.  Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+  6. Conveying Non-Source Forms.
+  You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+    a) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by the
+    Corresponding Source fixed on a durable physical medium
+    customarily used for software interchange.
+    b) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by a
+    written offer, valid for at least three years and valid for as
+    long as you offer spare parts or customer support for that product
+    model, to give anyone who possesses the object code either (1) a
+    copy of the Corresponding Source for all the software in the
+    product that is covered by this License, on a durable physical
+    medium customarily used for software interchange, for a price no
+    more than your reasonable cost of physically performing this
+    conveying of source, or (2) access to copy the
+    Corresponding Source from a network server at no charge.
+    c) Convey individual copies of the object code with a copy of the
+    written offer to provide the Corresponding Source.  This
+    alternative is allowed only occasionally and noncommercially, and
+    only if you received the object code with such an offer, in accord
+    with subsection 6b.
+    d) Convey the object code by offering access from a designated
+    place (gratis or for a charge), and offer equivalent access to the
+    Corresponding Source in the same way through the same place at no
+    further charge.  You need not require recipients to copy the
+    Corresponding Source along with the object code.  If the place to
+    copy the object code is a network server, the Corresponding Source
+    may be on a different server (operated by you or a third party)
+    that supports equivalent copying facilities, provided you maintain
+    clear directions next to the object code saying where to find the
+    Corresponding Source.  Regardless of what server hosts the
+    Corresponding Source, you remain obligated to ensure that it is
+    available for as long as needed to satisfy these requirements.
+    e) Convey the object code using peer-to-peer transmission, provided
+    you inform other peers where the object code and Corresponding
+    Source of the work are being offered to the general public at no
+    charge under subsection 6d.
+  A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+  A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling.  In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage.  For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product.  A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+  "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source.  The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+  If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information.  But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+  The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed.  Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+  Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+  7. Additional Terms.
+  "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law.  If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+  When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it.  (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.)  You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+  Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+    a) Disclaiming warranty or limiting liability differently from the
+    terms of sections 15 and 16 of this License; or
+    b) Requiring preservation of specified reasonable legal notices or
+    author attributions in that material or in the Appropriate Legal
+    Notices displayed by works containing it; or
+    c) Prohibiting misrepresentation of the origin of that material, or
+    requiring that modified versions of such material be marked in
+    reasonable ways as different from the original version; or
+    d) Limiting the use for publicity purposes of names of licensors or
+    authors of the material; or
+    e) Declining to grant rights under trademark law for use of some
+    trade names, trademarks, or service marks; or
+    f) Requiring indemnification of licensors and authors of that
+    material by anyone who conveys the material (or modified versions of
+    it) with contractual assumptions of liability to the recipient, for
+    any liability that these contractual assumptions directly impose on
+    those licensors and authors.
+  All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10.  If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term.  If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+  If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+  Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+  8. Termination.
+  You may not propagate or modify a covered work except as expressly
+provided under this License.  Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+  However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+  Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+  Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+  9. Acceptance Not Required for Having Copies.
+  You are not required to accept this License in order to receive or
+run a copy of the Program.  Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance.  However,
+nothing other than this License grants you permission to propagate or
+modify any covered work.  These actions infringe copyright if you do
+not accept this License.  Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+  10. Automatic Licensing of Downstream Recipients.
+  Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License.  You are not responsible
+for enforcing compliance by third parties with this License.
+  An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations.  If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+  You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License.  For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+  11. Patents.
+  A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based.  The
+work thus licensed is called the contributor's "contributor version".
+  A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version.  For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+  Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+  In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement).  To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+  If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients.  "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+  If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+  A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License.  You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+  Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+  12. No Surrender of Others' Freedom.
+  If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all.  For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+  13. Use with the GNU Affero General Public License.
+  Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work.  The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+  14. Revised Versions of this License.
+  The Free Software Foundation may publish revised and/or new versions of
+the GNU General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+  Each version is given a distinguishing version number.  If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation.  If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+  If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+  Later license versions may give you additional or different
+permissions.  However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+  15. Disclaimer of Warranty.
+  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+  16. Limitation of Liability.
+  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+  17. Interpretation of Sections 15 and 16.
+  If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+                     END OF TERMS AND CONDITIONS
+            How to Apply These Terms to Your New Programs
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+Also add information on how to contact you by electronic and paper mail.
+  If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+    <program>  Copyright (C) <year>  <name of author>
+    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, your program's commands
+might be different; for a GUI interface, you would use an "about box".
+  You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU GPL, see
+<https://www.gnu.org/licenses/>.
+  The GNU General Public License does not permit incorporating your program
+into proprietary programs.  If your program is a subroutine library, you
+may consider it more useful to permit linking proprietary applications with
+the library.  If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.  But first, please read
+<https://www.gnu.org/licenses/why-not-lgpl.html>.

README.md ADDED Viewed

	@@ -0,0 +1,297 @@

+---
+title: LightDiffusion-Next
+emoji: 🚀
+colorFrom: blue
+colorTo: gray
+sdk: gradio
+sdk_version: 5.33.2
+app_file: app.py
+python_version: 3.10.13
+---
+<div align="center">
+# Say hi to LightDiffusion-Next 👋
+[![demo platform](https://img.shields.io/badge/Play%20with%20LightDiffusion%21-LightDiffusion%20demo%20platform-lightblue)](https://huggingface.co/spaces/Aatricks/LightDiffusion-Next)&nbsp;
+**LightDiffusion-Next**  is the fastest AI-powered image generation WebUI, combining speed, precision, and flexibility in one cohesive tool.
+</br>
+</br>
+  <a href="https://github.com/LightDiffusion/LightDiffusion-Next">
+    <img src="https://github.com/user-attachments/assets/b994fe0d-3a2e-44ff-93a4-46919cf865e3" alt="Logo">
+  </a>
+</br>
+</div>
+---
+As a refactored and improved version of the original [LightDiffusion repository](https://github.com/Aatrick/LightDiffusion), this project enhances usability, maintainability, and functionality while introducing a host of new features to streamline your creative workflows.
+## Motivation:
+**LightDiffusion** was originally meant to be made in Rust, but due to the lack of support for the Rust language in the AI community, it was made in Python with the goal of being the simplest and fastest AI image generation tool.
+That's when the first version of LightDiffusion was born which only counted [3000 lines of code](https://github.com/LightDiffusion/LightDiffusion-original), only using Pytorch. With time, the [project](https://github.com/Aatrick/LightDiffusion) grew and became more complex, and the need for a refactor was evident. This is where **LightDiffusion-Next** comes in, with a more modular and maintainable codebase, and a plethora of new features and optimizations.
+📚 Learn more in the [official documentation](https://aatricks.github.io/LightDiffusion-Next/)
+For a source-based breakdown of the optimization stack, see the [Implemented Optimizations Report](https://aatricks.github.io/LightDiffusion-Next/implemented-optimizations-report/).
+---
+## 🌟 Highlights
+![image](https://github.com/user-attachments/assets/b994fe0d-3a2e-44ff-93a4-46919cf865e3)
+**LightDiffusion-Next** offers a powerful suite of tools to cater to creators at every level. At its core, it supports **Text-to-Image** (Txt2Img) and **Image-to-Image** (Img2Img) generation, offering a variety of upscale methods and samplers, to make it easier to create stunning images with minimal effort.
+Advanced users can take advantage of features like **attention syntax**, **Hires-Fix** or **ADetailer**. These tools provide better quality and flexibility for generating complex and high-resolution outputs.
+**LightDiffusion-Next** is fine-tuned for **performance**. Features such as **Xformers** acceleration, **BFloat16** precision support, **WaveSpeed** dynamic caching, **Multi-scale diffusion**, and **Stable-Fast** model compilation (which offers up to a 70% speed boost) ensure smooth and efficient operation, even on demanding workloads.
+---
+## ✨ Feature Showcase
+Here’s what makes LightDiffusion-Next stand out:
+- **Speed and Efficiency**:
+  Enjoy industry-leading performance with built-in Xformers, Pytorch, Wavespeed and Stable-Fast optimizations, Multi-scale diffusion, deepcache, AYS (Align Your Steps) scheduler, and automatic prompt caching achieving 30% up to 200% faster speeds compared to the rest of the AI image generation backends in SD1.5 and Flux.
+- **Automatic Detailing**:
+  Effortlessly enhance faces and body details with AI-driven tools based on the [Impact Pack](https://github.com/ltdrdata/ComfyUI-Impact-Pack).
+- **State Preservation**:
+  Save and resume your progress with saved states, ensuring seamless transitions between sessions.
+- **Integration-Ready**:
+  Collaborate and create directly in Discord with [Boubou](https://github.com/Aatrick/Boubou), or preview images dynamically with the optional **TAESD preview mode**.
+- **Image Previewing**:
+  Get a real-time preview of your generated images with TAESD, allowing for user-friendly and interactive workflows.
+- **Image Upscaling**:
+  Enhance your images with advanced upscaling options like UltimateSDUpscaling, ensuring high-quality results every time.
+- **Prompt Refinement**:
+  Use the optional Ollama-powered prompt enhancer (defaults to `qwen3:0.6b`) to refine your prompts and generate more accurate and detailed outputs.
+- **LoRa and Textual Inversion Embeddings**:
+    Leverage LoRa and textual inversion embeddings for highly customized and nuanced results, adding a new dimension to your creative process.
+- **Low-End Device Support**:
+    Run LightDiffusion-Next on low-end devices with as little as 2GB of VRAM or even no GPU, ensuring accessibility for all users.
+- **CFG++**:
+    Uses samplers modified to use CFG++ for better quality results compared to traditional methods.
+- **Newelle Extension**:
+    LightDiffusion-Next is also available as a backend to the [Newelle LightDiffusion extension](https://github.com/Aatricks/Newelle-Light-Diffusion) permitting to generate images inline during conversations with llms.
+---
+## ⚡ Performance Benchmarks
+**LightDiffusion-Next** dominates in performance:
+| **Tool**                           | **Speed (it/s)** |
+|------------------------------------|------------------|
+| **LightDiffusion with Stable-Fast** | 2.8              |
+| **LightDiffusion**                 | 1.9              |
+| **ComfyUI**                        | 1.4              |
+| **SDForge**                        | 1.3              |
+| **SDWebUI**                        | 0.9              |
+(All benchmarks are based on a 1024x1024 resolution with a batch size of 1 using BFloat16 precision without tweaking installations. Made with a 3060 mobile GPU using SD1.5.)
+With its unmatched speed and efficiency, LightDiffusion-Next sets the benchmark for AI image generation tools.
+---
+## 🛠 Installation
+> [!NOTE]
+> **Platform Support:** LightDiffusion-Next supports NVIDIA GPUs (CUDA), AMD GPUs (ROCm), and Apple Silicon (Metal/MPS). For AMD and Apple Silicon setup instructions, see the [ROCm and Metal/MPS Support Guide](https://aatrick.github.io/LightDiffusion/rocm-metal-support/).
+> [!WARNING]
+> **Disclaimer:** On Linux, the fastest way to get started is with the Docker setup below. Windows users often encounter an `EOF` build error when using Docker; if that happens, set up a local virtual environment instead and install SageAttention inside it.
+> [!NOTE]
+> You will need to download the [flux vae](https://huggingface.co/black-forest-labs/FLUX.1-schnell/blob/main/ae.safetensors) separately given its gated repo on Huggingface. Drop it in the `/include/vae` folder.
+### Quick Start
+1. Download a release or clone this repository.
+2. Run `run.bat` in a terminal.
+3. The modern React frontend will launch automatically at `http://localhost:5173` (proxied to the FastAPI backend at `http://localhost:7861`).
+**Recommended Launch Command:**
+```bash
+# Start both backend and frontend development server
+python server.py --frontend
+```
+**Production-style local run:**
+```bash
+# Serve the built React UI from FastAPI on a single port
+python server.py --port 7860
+```
+**ZeroGPU / Gradio launch:**
+```bash
+# Launch the Hugging Face ZeroGPU-compatible Gradio UI
+python app.py
+```
+### 🌌 Flux Support
+LightDiffusion-Next now features first-class support for **Flux2 Klein**. To get started, you need to download the required model components (Diffusion Model, Text Encoder, and VAE).
+We provide a convenient script to handle this automatically:
+```bash
+python download_flux.py
+```
+This will download approximately 16GB of weights into the `include/` directory.
+### 🤗 ZeroGPU / Gradio Space
+This repository now includes a Gradio `app.py` entrypoint for Hugging Face
+**ZeroGPU**. ZeroGPU is only supported for Gradio SDK Spaces, and the
+GPU-bound generation function is wrapped with `@spaces.GPU`.
+Recommended defaults for ZeroGPU:
+- keep `Keep Models Loaded` disabled
+- use 512x512 or 768x768 resolutions
+- generate 1 image at a time
+- prefer 10-25 steps with `ays`
+### 🐳 Docker Setup
+Run LightDiffusion-Next in a containerized environment with GPU acceleration.
+The Docker path remains available for local or dedicated GPU deployments and
+serves the built React frontend from the FastAPI backend on port `7860`.
+> [!IMPORTANT]
+> Confirm you have Docker Desktop configured with the NVIDIA Container Toolkit and at least 12-16GB of memory. Builds expect an NVIDIA GPU with compute capability 8.0 or higher and CUDA 12.0+ support for SageAttention/SpargeAttn.
+**Quick Start with Docker:**
+```bash
+# Build and run with docker-compose
+docker-compose up --build
+# Or build and run manually
+docker build -t lightdiffusion-next .
+docker run --gpus all -p 7860:7860 -e PORT=7860 -v ./output:/app/output lightdiffusion-next
+```
+**Custom GPU Architecture (Optional):**
+```bash
+# For faster builds, specify your GPU architecture (e.g., RTX 5060 = 12.0)
+docker-compose build --build-arg TORCH_CUDA_ARCH_LIST="12.0"
+# Default builds for: 8.0 (A100), 8.6 (RTX 30xx), 8.9 (RTX 40xx), 9.0 (H100), 12.0 (RTX 50xx)
+```
+**Built-in Optimizations:**
+The Docker image can optionally build the following acceleration paths:
+- ✨ **SageAttention** - 15% speedup with INT8 quantization (all supported GPUs)
+- 🚀 **SpargeAttn** - 40-60% speedup with sparse attention (compute 8.0-9.0 only)
+- ⚡ **Stable-Fast** - Optional UNet compilation for up to 70% faster SD1.5 inference
+Control them through build arguments (defaults shown below):
+```bash
+docker-compose build \
+  --build-arg TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0;12.0" \
+  --build-arg INSTALL_SAGEATTENTION=0 \
+  --build-arg INSTALL_SPARGEATTN=0 \
+  --build-arg INSTALL_STABLE_FAST=1 \
+  --build-arg INSTALL_OLLAMA=0
+```
+Set `INSTALL_STABLE_FAST=1` to enable stable-fast, `INSTALL_SAGEATTENTION=1`
+or `INSTALL_SPARGEATTN=1` to opt into the heavier attention-kernel builds, and
+`INSTALL_OLLAMA=1` to bake in the prompt enhancer runtime.
+> [!NOTE]
+> RTX 50 series (compute 12.0) GPUs currently use SageAttention when the SageAttention kernel is installed. SpargeAttn remains limited to earlier supported architectures.
+**Access the Web Interface:**
+- **FastAPI + React UI**: `http://localhost:7860`
+**Volume Mounts:**
+- `./output:/app/output` - Persist generated images
+- `./checkpoints:/app/include/checkpoints` - Store model files
+- `./loras:/app/include/loras` - Store LoRA files
+- `./embeddings:/app/include/embeddings` - Store embeddings
+### Advanced Setup
+- **Install from Source**:
+  Install dependencies via:
+  ```bash
+  pip install -r requirements.txt
+  ```
+  Add your SD1/1.5 safetensors model to the `checkpoints` directory, then launch the application.
+- **⚡Stable-Fast Optimization**:
+  Follow [this guide](https://github.com/chengzeyi/stable-fast?tab=readme-ov-file#installation) to enable Stable-Fast mode for optimal performance.
+  In Docker environments, set `INSTALL_STABLE_FAST=1` to compile it during the image build or `INSTALL_STABLE_FAST=0` (default) to skip.
+- **🚀 SageAttention & SpargeAttn Acceleration**:
+  Boost inference speed by up to 60% with advanced attention backends:
+  **Prerequisites:**
+  - [CUDA toolkit](https://developer.nvidia.com/cuda-toolkit-archive) installed with version compatible with your PyTorch installation
+  **SageAttention (15% speedup, Windows compatible):**
+  ```bash
+  cd SageAttention
+  pip install -e . --no-build-isolation
+  ```
+  **SpargeAttn (40-60% total speedup, requires WSL2/Linux):**
+> [!CAUTION]
+> SpargeAttn cannot be built with the default Windows linker. Use WSL2 or a native Linux environment and set the correct `TORCH_CUDA_ARCH_LIST` before installation.
+  ```bash
+  # On WSL2 or Linux only (Windows linker has path length limitations)
+  cd SpargeAttn
+  export TORCH_CUDA_ARCH_LIST="9.0"  # Or your GPU architecture (8.0, 8.6, 8.9, 9.0)
+  pip install -e . --no-build-isolation
+  ```
+  **Priority System:** SpargeAttn > SageAttention > PyTorch SDPA
+  - Both are automatically detected and used when available
+  - Graceful fallback for unsupported head dimensions
+- **🦙 Prompt Enhancer**:
+  Turn on the Ollama-backed enhancer to automatically restructure prompts. By default the app targets `qwen3:0.6b`:
+  ```bash
+  # Local install
+  pip install ollama
+  curl -fsSL https://ollama.com/install.sh | sh
+  # Start the Ollama daemon (keep this terminal open)
+  ollama serve
+  # New terminal: pull the default prompt enhancer model
+  ollama pull qwen3:0.6b
+  export PROMPT_ENHANCER_MODEL=qwen3:0.6b
+  ```
+  In Docker builds, set `--build-arg INSTALL_OLLAMA=1` (or update `docker-compose.yml`) to install Ollama and pre-pull the model automatically. You can override the runtime model/prefix with the `PROMPT_ENHANCER_MODEL` and `PROMPT_ENHANCER_PREFIX` environment variables. See the [Ollama guide](https://github.com/ollama/ollama?tab=readme-ov-file) for details.
+- **🤖 Discord Integration**:
+  Set up the Discord bot by following the [Boubou installation guide](https://github.com/Aatrick/Boubou).
+### Third-Party Licenses
+- This project distributes builds that depend on third-party open source components. For attribution details and the full license text, refer to `THIRD_PARTY_LICENSES.md`.
+---
+🎨 Enjoy exploring the powerful features of LightDiffusion-Next!
+> [!TIP]
+> ⭐ If this project helps you, please give it a star! It helps others discover it too.

THIRD_PARTY_LICENSES.md ADDED Viewed

	@@ -0,0 +1,948 @@

+# Third-Party Notices
+This project depends on the following third-party components. The notices below satisfy the attribution requirements of their respective licenses.
+## SageAttention (thu-ml/SageAttention)
+- Source: https://github.com/thu-ml/SageAttention
+- License: Apache License 2.0 (see full text below)
+- Notes: LightDiffusion-Next applies a build-time patch (`docker/sageattention_setup.patch`) to SageAttention's `setup.py` to honor the `TORCH_CUDA_ARCH_LIST` environment variable during compilation.
+## SpargeAttn (thu-ml/SpargeAttn)
+- Source: https://github.com/thu-ml/SpargeAttn
+- License: Apache License 2.0 (see full text below)
+- Notes: Used as provided, without local source modifications.
+## ComfyUI (comfyanonymous/ComfyUI)
+- Source: https://github.com/comfyanonymous/ComfyUI
+- License: GNU General Public License v3.0 (full text distributed in the repository root `LICENSE`)
+- Notes: Provides the node-graph runtime and execution engine extended by LightDiffusion-Next.
+## ComfyUI Ultimate SD Upscale (ssitu/ComfyUI_UltimateSDUpscale)
+- Source: https://github.com/ssitu/ComfyUI_UltimateSDUpscale
+- License: GNU General Public License v3.0 (full text distributed in the repository root `LICENSE`)
+- Notes: LightDiffusion-Next adapts the Ultimate SD Upscale script to integrate with its sampler interface.
+## ADetailer (Bing-su/adetailer)
+- Source: https://github.com/Bing-su/adetailer
+- License: GNU Affero General Public License v3.0 (see full text below)
+- Notes: Supplies detector-driven post-processing for face, hand, and subject refinements. No local source code changes are applied.
+## Stable Fast (chengzeyi/stable-fast)
+- Source: https://github.com/chengzeyi/stable-fast
+- License: MIT License (see full text below)
+- Notes: Imported as a wheel distribution to enable graph compilation speedups for Stable Diffusion pipelines.
+## ComfyUI-GGUF (city96/comfyui-gguf)
+- Source: https://github.com/city96/comfyui-gguf
+- License: Apache License 2.0 (see full text below)
+- Notes: Provides GGUF model loader nodes used by LightDiffusion-Next without modification.
+## WaveSpeed (ComfyUI-WaveSpeed)
+- Source: https://github.com/Fannovel16/ComfyUI-WaveSpeed (original project reference)
+- License: MIT License (see full text below)
+- Notes: LightDiffusion-Next vendors the WaveSpeed caching utilities as-is for first-block cache optimisations.
+---
+## Apache License
+```
+Apache License
+Version 2.0, January 2004
+http://www.apache.org/licenses/
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+1. Definitions.
+"License" shall mean the terms and conditions for use, reproduction,
+and distribution as defined by Sections 1 through 9 of this document.
+"Licensor" shall mean the copyright owner or entity authorized by
+the copyright owner that is granting the License.
+"Legal Entity" shall mean the union of the acting entity and all
+other entities that control, are controlled by, or are under common
+control with that entity. For the purposes of this definition,
+"control" means (i) the power, direct or indirect, to cause the
+direction or management of such entity, whether by contract or
+otherwise, or (ii) ownership of fifty percent (50%) or more of the
+outstanding shares, or (iii) beneficial ownership of such entity.
+"You" (or "Your") shall mean an individual or Legal Entity
+exercising permissions granted by this License.
+"Source" form shall mean the preferred form for making modifications,
+including but not limited to software source code, documentation
+source, and configuration files.
+"Object" form shall mean any form resulting from mechanical
+transformation or translation of a Source form, including but
+not limited to compiled object code, generated documentation,
+and conversions to other media types.
+"Work" shall mean the work of authorship, whether in Source or
+Object form, made available under the License, as indicated by a
+copyright notice that is included in or attached to the work
+(an example is provided in the Appendix below).
+"Derivative Works" shall mean any work, whether in Source or Object
+form, that is based on (or derived from) the Work and for which the
+editorial revisions, annotations, elaborations, or other modifications
+represent, as a whole, an original work of authorship. For the purposes
+of this License, Derivative Works shall not include works that remain
+separable from, or merely link (or bind by name) to the interfaces of,
+the Work and Derivative Works thereof.
+"Contribution" shall mean any work of authorship, including
+the original version of the Work and any modifications or additions
+to that Work or Derivative Works thereof, that is intentionally
+submitted to Licensor for inclusion in the Work by the copyright owner
+or by an individual or Legal Entity authorized to submit on behalf of
+the copyright owner. For the purposes of this definition, "submitted"
+means any form of electronic, verbal, or written communication sent
+to the Licensor or its representatives, including but not limited to
+communication on electronic mailing lists, source code control systems,
+and issue tracking systems that are managed by, or on behalf of, the
+Licensor for the purpose of discussing and improving the Work, but
+excluding communication that is conspicuously marked or otherwise
+designated in writing by the copyright owner as "Not a Contribution."
+"Contributor" shall mean Licensor and any individual or Legal Entity
+on behalf of whom a Contribution has been received by Licensor and
+subsequently incorporated within the Work.
+2. Grant of Copyright License. Subject to the terms and conditions of
+this License, each Contributor hereby grants to You a perpetual,
+worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+copyright license to reproduce, prepare Derivative Works of,
+publicly display, publicly perform, sublicense, and distribute the
+Work and such Derivative Works in Source or Object form.
+3. Grant of Patent License. Subject to the terms and conditions of
+this License, each Contributor hereby grants to You a perpetual,
+worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+(except as stated in this section) patent license to make, have made,
+use, offer to sell, sell, import, and otherwise transfer the Work,
+where such license applies only to those patent claims licensable
+by such Contributor that are necessarily infringed by their
+Contribution(s) alone or by combination of their Contribution(s)
+with the Work to which such Contribution(s) was submitted. If You
+institute patent litigation against any entity (including a
+cross-claim or counterclaim in a lawsuit) alleging that the Work
+or a Contribution incorporated within the Work constitutes direct
+or contributory patent infringement, then any patent licenses
+granted to You under this License for that Work shall terminate
+as of the date such litigation is filed.
+4. Redistribution. You may reproduce and distribute copies of the
+Work or Derivative Works thereof in any medium, with or without
+modifications, and in Source or Object form, provided that You
+meet the following conditions:
+(a) You must give any other recipients of the Work or
+Derivative Works a copy of this License; and
+(b) You must cause any modified files to carry prominent notices
+stating that You changed the files; and
+(c) You must retain, in the Source form of any Derivative Works
+that You distribute, all copyright, patent, trademark, and
+attribution notices from the Source form of the Work,
+excluding those notices that do not pertain to any part of
+the Derivative Works; and
+(d) If the Work includes a "NOTICE" text file as part of its
+distribution, then any Derivative Works that You distribute must
+include a readable copy of the attribution notices contained
+within such NOTICE file, excluding those notices that do not
+pertain to any part of the Derivative Works, in at least one
+of the following places: within a NOTICE text file distributed
+as part of the Derivative Works; within the Source form or
+documentation, if provided along with the Derivative Works; or,
+within a display generated by the Derivative Works, if and
+wherever such third-party notices normally appear. The contents
+of the NOTICE file are for informational purposes only and
+do not modify the License. You may add Your own attribution
+notices within Derivative Works that You distribute, alongside
+or as an addendum to the NOTICE text from the Work, provided
+that such additional attribution notices cannot be construed
+as modifying the License.
+You may add Your own copyright statement to Your modifications and
+may provide additional or different license terms and conditions
+for use, reproduction, or distribution of Your modifications, or
+for any such Derivative Works as a whole, provided Your use,
+reproduction, and distribution of the Work otherwise complies with
+the conditions stated in this License.
+5. Submission of Contributions. Unless You explicitly state otherwise,
+any Contribution intentionally submitted for inclusion in the Work
+by You to the Licensor shall be under the terms and conditions of
+this License, without any additional terms or conditions.
+Notwithstanding the above, nothing herein shall supersede or modify
+the terms of any separate license agreement you may have executed
+with Licensor regarding such Contributions.
+6. Trademarks. This License does not grant permission to use the trade
+names, trademarks, service marks, or product names of the Licensor,
+except as required for reasonable and customary use in describing the
+origin of the Work and reproducing the content of the NOTICE file.
+7. Disclaimer of Warranty. Unless required by applicable law or
+agreed to in writing, Licensor provides the Work (and each
+Contributor provides its Contributions) on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+implied, including, without limitation, any warranties or conditions
+of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+PARTICULAR PURPOSE. You are solely responsible for determining the
+appropriateness of using or redistributing the Work and assume any
+risks associated with Your exercise of permissions under this License.
+8. Limitation of Liability. In no event and under no legal theory,
+whether in tort (including negligence), contract, or otherwise,
+unless required by applicable law (such as deliberate and grossly
+negligent acts) or agreed to in writing, shall any Contributor be
+liable to You for damages, including any direct, indirect, special,
+incidental, or consequential damages of any character arising as a
+result of this License or out of the use or inability to use the
+Work (including but not limited to damages for loss of goodwill,
+work stoppage, computer failure or malfunction, or any and all
+other commercial damages or losses), even if such Contributor
+has been advised of the possibility of such damages.
+9. Accepting Warranty or Additional Liability. While redistributing
+the Work or Derivative Works thereof, You may choose to offer,
+and charge a fee for, acceptance of support, warranty, indemnity,
+or other liability obligations and/or rights consistent with this
+License. However, in accepting such obligations, You may act only
+on Your own behalf and on Your sole responsibility, not on behalf
+of any other Contributor, and only if You agree to indemnify,
+defend, and hold each Contributor harmless for any liability
+incurred by, or claims asserted against, such Contributor by reason
+of your accepting any such warranty or additional liability.
+END OF TERMS AND CONDITIONS
+APPENDIX: How to apply the Apache License to your work.
+To apply the Apache License to your work, attach the following
+boilerplate notice, with the fields enclosed by brackets "[]"
+replaced with your own identifying information. (Don't include
+the brackets!)  The text should be enclosed in the appropriate
+comment syntax for the file format. We also recommend that a
+file or class name and description of purpose be included on the
+same "printed page" as the copyright notice for easier
+identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+```
+---
+## MIT License (Stable Fast)
+```
+MIT License
+Copyright (c) 2023 C
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+```
+---
+## GNU Affero General Public License v3.0 (ADetailer)
+```
+                                        GNU AFFERO GENERAL PUBLIC LICENSE
+                                             Version 3, 19 November 2007
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+                                                        Preamble
+    The GNU Affero General Public License is a free, copyleft license for
+software and other kinds of works, specifically designed to ensure
+cooperation with the community in the case of network server software.
+    The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works.  By contrast,
+our General Public Licenses are intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users.
+    When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+    Developers that use our General Public Licenses protect your rights
+with two steps: (1) assert copyright on the software, and (2) offer
+you this License which gives you legal permission to copy, distribute
+and/or modify the software.
+    A secondary benefit of defending all users' freedom is that
+improvements made in alternate versions of the program, if they
+receive widespread use, become available for other developers to
+incorporate.  Many developers of free software are heartened and
+encouraged by the resulting cooperation.  However, in the case of
+software used on network servers, this result may fail to come about.
+The GNU General Public License permits making a modified version and
+letting the public access it on a server without ever releasing its
+source code to the public.
+    The GNU Affero General Public License is designed specifically to
+ensure that, in such cases, the modified source code becomes available
+to the community.  It requires the operator of a network server to
+provide the source code of the modified version running there to the
+users of that server.  Therefore, public use of a modified version, on
+a publicly accessible server, gives the public access to the source
+code of the modified version.
+    An older license, called the Affero General Public License and
+published by Affero, was designed to accomplish similar goals.  This is
+a different license, not a version of the Affero GPL, but Affero has
+released a new version of the Affero GPL which permits relicensing under
+this license.
+    The precise terms and conditions for copying, distribution and
+modification follow.
+                                             TERMS AND CONDITIONS
+    0. Definitions.
+    "This License" refers to version 3 of the GNU Affero General Public License.
+    "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+    "The Program" refers to any copyrightable work licensed under this
+License.  Each licensee is addressed as "you".  "Licensees" and
+"recipients" may be individuals or organizations.
+    To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy.  The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+    A "covered work" means either the unmodified Program or a work based
+on the Program.
+    To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy.  Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+    To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies.  Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+    An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License.  If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+    1. Source Code.
+    The "source code" for a work means the preferred form of the work
+for making modifications to it.  "Object code" means any non-source
+form of a work.
+    A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+    The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form.  A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+    The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities.  However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work.  For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+    The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+    The Corresponding Source for a work in source code form is that
+same work.
+    2. Basic Permissions.
+    All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met.  This License explicitly affirms your unlimited
+permission to run the unmodified Program.  The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work.  This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+    You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force.  You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright.  Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+    Conveying under any other circumstances is permitted solely under
+the conditions stated below.  Sublicensing is not allowed; section 10
+makes it unnecessary.
+    3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+    No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+    When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+    4. Conveying Verbatim Copies.
+    You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+    You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+    5. Conveying Modified Source Versions.
+    You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+        a) The work must carry prominent notices stating that you modified
+        it, and giving a relevant date.
+        b) The work must carry prominent notices stating that it is
+        released under this License and any conditions added under section
+        7.  This requirement modifies the requirement in section 4 to
+        "keep intact all notices".
+        c) You must license the entire work, as a whole, under this
+        License to anyone who comes into possession of a copy.  This
+        License will therefore apply, along with any applicable section 7
+        additional terms, to the whole of the work, and all its parts,
+        regardless of how they are packaged.  This License gives no
+        permission to license the work in any other way, but it does not
+        invalidate such permission if you have separately received it.
+        d) If the work has interactive user interfaces, each must display
+        Appropriate Legal Notices; however, if the Program has interactive
+        interfaces that do not display Appropriate Legal Notices, your
+        work need not make them do so.
+    A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit.  Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+    6. Conveying Non-Source Forms.
+    You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+        a) Convey the object code in, or embodied in, a physical product
+        (including a physical distribution medium), accompanied by the
+        Corresponding Source fixed on a durable physical medium
+        customarily used for software interchange.
+        b) Convey the object code in, or embodied in, a physical product
+        (including a physical distribution medium), accompanied by a
+        written offer, valid for at least three years and valid for as
+        long as you offer spare parts or customer support for that product
+        model, to give anyone who possesses the object code either (1) a
+        copy of the Corresponding Source for all the software in the
+        product that is covered by this License, on a durable physical
+        medium customarily used for software interchange, for a price no
+        more than your reasonable cost of physically performing this
+        conveying of source, or (2) access to copy the
+        Corresponding Source from a network server at no charge.
+        c) Convey individual copies of the object code with a copy of the
+        written offer to provide the Corresponding Source.  This
+        alternative is allowed only occasionally and noncommercially, and
+        only if you received the object code with such an offer, in accord
+        with subsection 6b.
+        d) Convey the object code by offering access from a designated
+        place (gratis or for a charge), and offer equivalent access to the
+        Corresponding Source in the same way through the same place at no
+        further charge.  You need not require recipients to copy the
+        Corresponding Source along with the object code.  If the place to
+        copy the object code is a network server, the Corresponding Source
+        may be on a different server (operated by you or a third party)
+        that supports equivalent copying facilities, provided you maintain
+        clear directions next to the object code saying where to find the
+        Corresponding Source.  Regardless of what server hosts the
+        Corresponding Source, you remain obligated to ensure that it is
+        available for as long as needed to satisfy these requirements.
+        e) Convey the object code using peer-to-peer transmission, provided
+        you inform other peers where the object code and Corresponding
+        Source of the work are being offered to the general public at no
+        charge under subsection 6d.
+    A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+    A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling.  In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage.  For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product.  A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+    "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source.  The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+    If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information.  But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+    The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed.  Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+    Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+    7. Additional Terms.
+    "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law.  If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+    When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it.  (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.)  You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+    Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+        a) Disclaiming warranty or limiting liability differently from the
+        terms of sections 15 and 16 of this License; or
+        b) Requiring preservation of specified reasonable legal notices or
+        author attributions in that material or in the Appropriate Legal
+        Notices displayed by works containing it; or
+        c) Prohibiting misrepresentation of the origin of that material, or
+        requiring that modified versions of such material be marked in
+        reasonable ways as different from the original version; or
+        d) Limiting the use for publicity purposes of names of licensors or
+        authors of the material; or
+        e) Declining to grant rights under trademark law for use of some
+        trade names, trademarks, or service marks; or
+            f) Requiring indemnification of licensors and authors of that
+            material by anyone who conveys the material (or modified versions of
+            it) with contractual assumptions of liability to the recipient, for
+            any liability that these contractual assumptions directly impose on
+            those licensors and authors.
+        All other non-permissive additional terms are considered "further
+    restrictions" within the meaning of section 10.  If the Program as you
+    received it, or any part of it, contains a notice stating that it is
+    governed by this License along with a term that is a further
+    restriction, you may remove that term.  If a license document contains
+    a further restriction but permits relicensing or conveying under this
+    License, you may add to a covered work material governed by the terms
+    of that license document, provided that the further restriction does
+    not survive such relicensing or conveying.
+        If you add terms to a covered work in accord with this section, you
+    must place, in the relevant source files, a statement of the
+    additional terms that apply to those files, or a notice indicating
+    where to find the applicable terms.
+        Additional terms, permissive or non-permissive, may be stated in the
+    form of a separately written license, or stated as exceptions;
+    the above requirements apply either way.
+        8. Termination.
+        You may not propagate or modify a covered work except as expressly
+    provided under this License.  Any attempt otherwise to propagate or
+    modify it is void, and will automatically terminate your rights under
+    this License (including any patent licenses granted under the third
+    paragraph of section 11).
+        However, if you cease all violation of this License, then your
+    license from a particular copyright holder is reinstated (a)
+    provisionally, unless and until the copyright holder explicitly and
+    finally terminates your license, and (b) permanently, if the copyright
+    holder fails to notify you of the violation by some reasonable means
+    prior to 60 days after the cessation.
+        Moreover, your license from a particular copyright holder is
+    reinstated permanently if the copyright holder notifies you of the
+    violation by some reasonable means, this is the first time you have
+    received notice of violation of this License (for any work) from that
+    copyright holder, and you cure the violation prior to 30 days after
+    your receipt of the notice.
+        Termination of your rights under this section does not terminate the
+    licenses of parties who have received copies or rights from you under
+    this License.  If your rights have been terminated and not permanently
+    reinstated, you do not qualify to receive new licenses for the same
+    material under section 10.
+        9. Acceptance Not Required for Having Copies.
+        You are not required to accept this License in order to receive or
+    run a copy of the Program.  Ancillary propagation of a covered work
+    occurring solely as a consequence of using peer-to-peer transmission
+    to receive a copy likewise does not require acceptance.  However,
+    nothing other than this License grants you permission to propagate or
+    modify any covered work.  These actions infringe copyright if you do
+    not accept this License.  Therefore, by modifying or propagating a
+    covered work, you indicate your acceptance of this License to do so.
+        10. Automatic Licensing of Downstream Recipients.
+        Each time you convey a covered work, the recipient automatically
+    receives a license from the original licensors, to run, modify and
+    propagate that work, subject to this License.  You are not responsible
+    for enforcing compliance by third parties with this License.
+        An "entity transaction" is a transaction transferring control of an
+    organization, or substantially all assets of one, or subdividing an
+    organization, or merging organizations.  If propagation of a covered
+    work results from an entity transaction, each party to that
+    transaction who receives a copy of the work also receives whatever
+    licenses to the work the party's predecessor in interest had or could
+    give under the previous paragraph, plus a right to possession of the
+    Corresponding Source of the work from the predecessor in interest, if
+    the predecessor has it or can get it with reasonable efforts.
+        You may not impose any further restrictions on the exercise of the
+    rights granted or affirmed under this License.  For example, you may
+    not impose a license fee, royalty, or other charge for exercise of
+    rights granted under this License, and you may not initiate litigation
+    (including a cross-claim or counterclaim in a lawsuit) alleging that
+    any patent claim is infringed by making, using, selling, offering for
+    sale, or importing the Program or any portion of it.
+        11. Patents.
+        A "contributor" is a copyright holder who authorizes use under this
+    License of the Program or a work on which the Program is based.  The
+    work thus licensed is called the contributor's "contributor version".
+        A contributor's "essential patent claims" are all patent claims
+    owned or controlled by the contributor, whether already acquired or
+    hereafter acquired, that would be infringed by some manner, permitted
+    by this License, of making, using, or selling its contributor version,
+    but do not include claims that would be infringed only as a
+    consequence of further modification of the contributor version.  For
+    purposes of this definition, "control" includes the right to grant
+    patent sublicenses in a manner consistent with the requirements of
+    this License.
+        Each contributor grants you a non-exclusive, worldwide, royalty-free
+    patent license under the contributor's essential patent claims, to
+    make, use, sell, offer for sale, import and otherwise run, modify and
+    propagate the contents of its contributor version.
+        In the following three paragraphs, a "patent license" is any express
+    agreement or commitment, however denominated, not to enforce a patent
+    (such as an express permission to practice a patent or covenant not to
+    sue for patent infringement).  To "grant" such a patent license to a
+    party means to make such an agreement or commitment not to enforce a
+    patent against the party.
+        If you convey a covered work, knowingly relying on a patent license,
+    and the Corresponding Source of the work is not available for anyone
+    to copy, free of charge and under the terms of this License, through a
+    publicly available network server or other readily accessible means,
+    then you must either (1) cause the Corresponding Source to be so
+    available, or (2) arrange to deprive yourself of the benefit of the
+    patent license for this particular work, or (3) arrange, in a manner
+    consistent with the requirements of this License, to extend the patent
+    license to downstream recipients.  "Knowingly relying" means you have
+    actual knowledge that, but for the patent license, your conveying the
+    covered work in a country, or your recipient's use of the covered work
+    in a country, would infringe one or more identifiable patents in that
+    country that you have reason to believe are valid.
+        If, pursuant to or in connection with a single transaction or
+    arrangement, you convey, or propagate by procuring conveyance of, a
+    covered work, and grant a patent license to some of the parties
+    receiving the covered work authorizing them to use, propagate, modify
+    or convey a specific copy of the covered work, then the patent license
+    you grant is automatically extended to all recipients of the covered
+    work and works based on it.
+        A patent license is "discriminatory" if it does not include within
+    the scope of its coverage, prohibits the exercise of, or is
+    conditioned on the non-exercise of one or more of the rights that are
+    specifically granted under this License.  You may not convey a covered
+    work if you are a party to an arrangement with a third party that is
+    in the business of distributing software, under which you make payment
+    to the third party based on the extent of your activity of conveying
+    the work, and under which the third party grants, to any of the
+    parties who would receive the covered work from you, a discriminatory
+    patent license (a) in connection with copies of the covered work
+    conveyed by you (or copies made from those copies), or (b) primarily
+    for and in connection with specific products or compilations that
+    contain the covered work, unless you entered into that arrangement,
+    or that patent license was granted, prior to 28 March 2007.
+        Nothing in this License shall be construed as excluding or limiting
+    any implied license or other defenses to infringement that may
+    otherwise be available to you under applicable patent law.
+        12. No Surrender of Others' Freedom.
+        If conditions are imposed on you (whether by court order, agreement or
+    otherwise) that contradict the conditions of this License, they do not
+    excuse you from the conditions of this License.  If you cannot convey a
+    covered work so as to satisfy simultaneously your obligations under this
+    License and any other pertinent obligations, then as a consequence you may
+    not convey it at all.  For example, if you agree to terms that obligate you
+    to collect a royalty for further conveying from those to whom you convey
+    the Program, the only way you could satisfy both those terms and this
+    License would be to refrain entirely from conveying the Program.
+        13. Remote Network Interaction; Use with the GNU General Public License.
+        Notwithstanding any other provision of this License, if you modify the
+    Program, your modified version must prominently offer all users
+    interacting with it remotely through a computer network (if your version
+    supports such interaction) an opportunity to receive the Corresponding
+    Source of your version by providing access to the Corresponding Source
+    from a network server at no charge, through some standard or customary
+    means of facilitating copying of software.  This Corresponding Source
+    shall include the Corresponding Source for any work covered by version 3
+    of the GNU General Public License that is incorporated pursuant to the
+    following paragraph.
+        Notwithstanding any other provision of this License, you have
+    permission to link or combine any covered work with a work licensed
+    under version 3 of the GNU General Public License into a single
+    combined work, and to convey the resulting work.  The terms of this
+    License will continue to apply to the part which is the covered work,
+    but the work with which it is combined will remain governed by version
+    3 of the GNU General Public License.
+        14. Revised Versions of this License.
+        The Free Software Foundation may publish revised and/or new versions of
+    the GNU Affero General Public License from time to time.  Such new versions
+    will be similar in spirit to the present version, but may differ in detail to
+    address new problems or concerns.
+        Each version is given a distinguishing version number.  If the
+    Program specifies that a certain numbered version of the GNU Affero General
+    Public License "or any later version" applies to it, you have the
+    option of following the terms and conditions either of that numbered
+    version or of any later version published by the Free Software
+    Foundation.  If the Program does not specify a version number of the
+    GNU Affero General Public License, you may choose any version ever published
+    by the Free Software Foundation.
+        If the Program specifies that a proxy can decide which future
+    versions of the GNU Affero General Public License can be used, that proxy's
+    public statement of acceptance of a version permanently authorizes you
+    to choose that version for the Program.
+        Later license versions may give you additional or different
+    permissions.  However, no additional obligations are imposed on any
+    author or copyright holder as a result of your choosing to follow a
+    later version.
+        15. Disclaimer of Warranty.
+        THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+    APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+    HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+    OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+    THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+    PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+    IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+    ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+        16. Limitation of Liability.
+        IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+    WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+    THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+    GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+    USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+    DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+    PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+    EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+    SUCH DAMAGES.
+        17. Interpretation of Sections 15 and 16.
+        If the disclaimer of warranty and limitation of liability provided
+    above cannot be given local legal effect according to their terms,
+    reviewing courts shall apply local law that most closely approximates
+    an absolute waiver of all civil liability in connection with the
+    Program, unless a warranty or assumption of liability accompanies a
+    copy of the Program in return for a fee.
+                                             END OF TERMS AND CONDITIONS
+                            How to Apply These Terms to Your New Programs
+        If you develop a new program, and you want it to be of the greatest
+    possible use to the public, the best way to achieve this is to make it
+    free software which everyone can redistribute and change under these terms.
+        To do so, attach the following notices to the program.  It is safest
+    to attach them to the start of each source file to most effectively
+    state the exclusion of warranty; and each file should have at least
+    the "copyright" line and a pointer to where the full notice is found.
+            <one line to give the program's name and a brief idea of what it does.>
+            Copyright (C) <year>  <name of author>
+            This program is free software: you can redistribute it and/or modify
+            it under the terms of the GNU Affero General Public License as published by
+            the Free Software Foundation, either version 3 of the License, or
+            (at your option) any later version.
+            This program is distributed in the hope that it will be useful,
+            but WITHOUT ANY WARRANTY; without even the implied warranty of
+            MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+            GNU Affero General Public License for more details.
+            You should have received a copy of the GNU Affero General Public License
+            along with this program.  If not, see <https://www.gnu.org/licenses/>.
+    Also add information on how to contact you by electronic and paper mail.
+        If your software can interact with users remotely through a computer
+    network, you should also make sure that it provides a way for users to
+    get its source.  For example, if your program is a web application, its
+    interface could display a "Source" link that leads users to an archive
+    of the code.  There are many ways you could offer source, and different
+    solutions will be better for different programs; see section 13 for the
+    specific requirements.
+        You should also get your employer (if you work as a programmer) or school,
+    if any, to sign a "copyright disclaimer" for the program, if necessary.
+    For more information on this, and how to apply and follow the GNU AGPL, see
+    <https://www.gnu.org/licenses/>.
+    ```

app.py ADDED Viewed

	@@ -0,0 +1,367 @@

+from __future__ import annotations
+import glob
+import os
+import time
+import uuid
+from typing import Any, Optional
+import gradio as gr
+import spaces
+from PIL import Image
+from src.Core.Models.ModelFactory import list_available_models
+from src.Device.ModelCache import get_model_cache
+from src.user import app_instance
+from src.user.pipeline import pipeline
+SCHEDULER_CHOICES = [
+    "ays",
+    "ays_sd15",
+    "ays_sdxl",
+    "karras",
+    "normal",
+    "simple",
+    "beta",
+]
+SAMPLER_CHOICES = [
+    "dpmpp_sde_cfgpp",
+    "dpmpp_2m_cfgpp",
+    "euler",
+    "euler_ancestral",
+    "dpmpp_sde",
+    "dpmpp_2m",
+    "euler_cfgpp",
+    "euler_ancestral_cfgpp",
+]
+def _list_model_mapping() -> list[tuple[str, str]]:
+    return list_available_models(return_mapping=True)
+def _model_choices() -> list[str]:
+    return [name for name, _ in _list_model_mapping()]
+def _resolve_model_path(display_name: Optional[str]) -> Optional[str]:
+    if not display_name:
+        return None
+    for name, path in _list_model_mapping():
+        if name == display_name:
+            return path
+    return None
+def _load_recent_images(
+    prefix: Optional[str] = None,
+    started_at: Optional[float] = None,
+    limit: int = 12,
+) -> list[Image.Image]:
+    files: list[str] = []
+    for ext in ("*.png", "*.jpg", "*.jpeg", "*.webp"):
+        files.extend(glob.glob(os.path.join(".", "output", "**", ext), recursive=True))
+    filtered: list[str] = []
+    for path in files:
+        basename = os.path.basename(path)
+        if prefix and prefix not in basename:
+            continue
+        if started_at is not None:
+            try:
+                if os.path.getmtime(path) < (started_at - 1.0):
+                    continue
+            except OSError:
+                continue
+        filtered.append(path)
+    filtered.sort(key=lambda p: os.path.getmtime(p), reverse=True)
+    images: list[Image.Image] = []
+    for path in filtered[:limit]:
+        try:
+            with Image.open(path) as img:
+                images.append(img.copy())
+        except Exception:
+            continue
+    return images
+def _refresh_history() -> tuple[list[Image.Image], str]:
+    images = _load_recent_images(limit=48)
+    if not images:
+        return [], "No generated images found yet."
+    return images, f"Loaded {len(images)} recent images from `output/`."
+def _interrupt_generation() -> str:
+    app_instance.app.request_interrupt()
+    return "Interrupt requested. The current generation will stop at the next safe check."
+@spaces.GPU(duration=240)
+def _run_generation(
+    prompt: str,
+    negative_prompt: str,
+    width: int,
+    height: int,
+    num_images: int,
+    batch_size: int,
+    scheduler: str,
+    sampler: str,
+    steps: int,
+    guidance_scale: float,
+    model_name: Optional[str],
+    hires_fix: bool,
+    adetailer: bool,
+    enhance_prompt: bool,
+    img2img_enabled: bool,
+    img2img_image: Optional[str],
+    img2img_denoise: float,
+    stable_fast: bool,
+    reuse_seed: bool,
+    enable_multiscale: bool,
+    multiscale_intermittent: bool,
+    multiscale_factor: float,
+    multiscale_fullres_start: int,
+    multiscale_fullres_end: int,
+    keep_models_loaded: bool,
+    progress: gr.Progress = gr.Progress(track_tqdm=False),
+) -> tuple[list[Image.Image], str, dict[str, Any], list[Image.Image]]:
+    if not prompt.strip():
+        raise gr.Error("Prompt is required.")
+    if img2img_enabled and not img2img_image:
+        raise gr.Error("Upload an input image or disable Img2Img.")
+    request_prefix = f"LD-GRADIO-{uuid.uuid4().hex[:8]}"
+    started_at = time.time()
+    app = app_instance.app
+    app.clear_interrupt()
+    app.cleanup_all_previews()
+    app.previewer_var.set(True)
+    try:
+        try:
+            get_model_cache().set_keep_models_loaded(bool(keep_models_loaded))
+        except Exception:
+            pass
+        model_path = _resolve_model_path(model_name)
+        def _progress_callback(args: dict[str, Any]) -> None:
+            step = int(args.get("i", 0))
+            total = int(args.get("total_steps", steps))
+            if total > 0:
+                progress(
+                    min((step + 1) / total, 1.0),
+                    desc=f"Sampling step {step + 1}/{total}",
+                )
+        progress(0, desc="Preparing generation")
+        result = pipeline(
+            prompt=prompt,
+            negative_prompt=negative_prompt,
+            w=int(width),
+            h=int(height),
+            number=int(num_images),
+            batch=int(batch_size),
+            scheduler=scheduler,
+            sampler=sampler,
+            steps=int(steps),
+            cfg_scale=float(guidance_scale),
+            hires_fix=bool(hires_fix),
+            adetailer=bool(adetailer),
+            enhance_prompt=bool(enhance_prompt),
+            img2img=bool(img2img_enabled),
+            img2img_image=img2img_image if img2img_enabled else None,
+            img2img_denoise=float(img2img_denoise),
+            stable_fast=bool(stable_fast),
+            reuse_seed=bool(reuse_seed),
+            autohdr=True,
+            realistic_model=False,
+            model_path=model_path,
+            enable_multiscale=bool(enable_multiscale),
+            multiscale_intermittent_fullres=bool(multiscale_intermittent),
+            multiscale_factor=float(multiscale_factor),
+            multiscale_fullres_start=int(multiscale_fullres_start),
+            multiscale_fullres_end=int(multiscale_fullres_end),
+            request_filename_prefix=request_prefix,
+            callback=_progress_callback,
+        )
+        progress(1, desc="Generation complete")
+        final_images = _load_recent_images(
+            prefix=request_prefix,
+            started_at=started_at,
+            limit=max(1, int(num_images)),
+        )
+        if not final_images and adetailer:
+            final_images = _load_recent_images(
+                started_at=started_at,
+                limit=max(1, int(num_images)),
+            )
+        preview_images = list(app.preview_images[:4]) if app.preview_images else []
+        if not final_images:
+            raise gr.Error("Generation completed but no output images were found in `output/`.")
+        used_prompt = result.get("used_prompt", prompt) if isinstance(result, dict) else prompt
+        metadata = {
+            "request_prefix": request_prefix,
+            "model_name": model_name or "auto/default",
+            "used_prompt": used_prompt,
+            "enhancement_applied": bool(result.get("enhancement_applied")) if isinstance(result, dict) else False,
+            "img2img_enabled": bool(img2img_enabled),
+            "adetailer": bool(adetailer),
+            "hires_fix": bool(hires_fix),
+        }
+        status = f"Generated {len(final_images)} image(s) using `{sampler}` + `{scheduler}`."
+        return final_images, status, metadata, preview_images
+    finally:
+        app.clear_interrupt()
+def _build_demo() -> gr.Blocks:
+    default_models = _model_choices()
+    default_model = default_models[0] if default_models else None
+    with gr.Blocks(title="LightDiffusion-Next ZeroGPU") as demo:
+        gr.Markdown(
+            """
+            # LightDiffusion-Next
+            ZeroGPU-compatible Gradio UI. The generation function is wrapped with `@spaces.GPU`
+            so Hugging Face can allocate a GPU only while inference is running.
+            """
+        )
+        with gr.Row():
+            with gr.Column(scale=2):
+                prompt = gr.Textbox(label="Prompt", lines=5, placeholder="Describe the image you want to generate")
+                negative_prompt = gr.Textbox(
+                    label="Negative Prompt",
+                    lines=3,
+                    value="(worst quality, low quality:1.4), (zombie, sketch, interlocked fingers, comic), (embedding:EasyNegative), (embedding:badhandv4)",
+                )
+                with gr.Row():
+                    width = gr.Slider(256, 1536, value=512, step=64, label="Width")
+                    height = gr.Slider(256, 1536, value=512, step=64, label="Height")
+                with gr.Row():
+                    num_images = gr.Slider(1, 4, value=1, step=1, label="Images")
+                    batch_size = gr.Slider(1, 4, value=1, step=1, label="Batch Size")
+                with gr.Row():
+                    scheduler = gr.Dropdown(SCHEDULER_CHOICES, value="ays", label="Scheduler")
+                    sampler = gr.Dropdown(SAMPLER_CHOICES, value="dpmpp_sde_cfgpp", label="Sampler")
+                with gr.Row():
+                    steps = gr.Slider(1, 50, value=20, step=1, label="Steps")
+                    guidance_scale = gr.Slider(1.0, 15.0, value=7.0, step=0.1, label="CFG")
+                model_name = gr.Dropdown(
+                    choices=default_models,
+                    value=default_model,
+                    allow_custom_value=False,
+                    label="Model",
+                )
+                with gr.Accordion("Advanced", open=False):
+                    with gr.Row():
+                        hires_fix = gr.Checkbox(label="HiresFix", value=False)
+                        adetailer = gr.Checkbox(label="ADetailer", value=False)
+                        enhance_prompt = gr.Checkbox(label="Enhance Prompt", value=False)
+                        stable_fast = gr.Checkbox(label="Stable-Fast", value=False)
+                    with gr.Row():
+                        reuse_seed = gr.Checkbox(label="Reuse Last Seed", value=False)
+                        enable_multiscale = gr.Checkbox(label="Multiscale", value=False)
+                        multiscale_intermittent = gr.Checkbox(label="Intermittent Fullres", value=True)
+                        keep_models_loaded = gr.Checkbox(label="Keep Models Loaded", value=False)
+                    with gr.Row():
+                        multiscale_factor = gr.Slider(0.25, 1.0, value=0.5, step=0.05, label="Multiscale Factor")
+                        multiscale_fullres_start = gr.Slider(1, 20, value=10, step=1, label="Fullres Start")
+                        multiscale_fullres_end = gr.Slider(1, 20, value=8, step=1, label="Fullres End")
+                with gr.Accordion("Img2Img", open=False):
+                    img2img_enabled = gr.Checkbox(label="Enable Img2Img", value=False)
+                    img2img_image = gr.Image(label="Input Image", type="filepath")
+                    img2img_denoise = gr.Slider(0.0, 1.0, value=0.75, step=0.01, label="Denoise Strength")
+                with gr.Row():
+                    generate_button = gr.Button("Generate", variant="primary")
+                    interrupt_button = gr.Button("Interrupt", variant="stop")
+                    refresh_models_button = gr.Button("Refresh Models")
+            with gr.Column(scale=3):
+                status = gr.Markdown("Ready.")
+                gallery = gr.Gallery(label="Generated Images", columns=2, height="auto")
+                metadata = gr.JSON(label="Generation Metadata")
+                preview_gallery = gr.Gallery(label="Last Preview Frames", columns=4, height="auto")
+        with gr.Tab("History"):
+            history_status = gr.Markdown("No generated images loaded yet.")
+            history_gallery = gr.Gallery(label="Recent Output Images", columns=4, height="auto")
+            refresh_history = gr.Button("Refresh History")
+        refresh_models_button.click(
+            fn=lambda: gr.update(
+                choices=_model_choices(),
+                value=(_model_choices()[0] if _model_choices() else None),
+            ),
+            outputs=model_name,
+            queue=False,
+        )
+        interrupt_button.click(_interrupt_generation, outputs=status, queue=False)
+        refresh_history.click(_refresh_history, outputs=[history_gallery, history_status], queue=False)
+        demo.load(_refresh_history, outputs=[history_gallery, history_status], queue=False)
+        generate_button.click(
+            _run_generation,
+            inputs=[
+                prompt,
+                negative_prompt,
+                width,
+                height,
+                num_images,
+                batch_size,
+                scheduler,
+                sampler,
+                steps,
+                guidance_scale,
+                model_name,
+                hires_fix,
+                adetailer,
+                enhance_prompt,
+                img2img_enabled,
+                img2img_image,
+                img2img_denoise,
+                stable_fast,
+                reuse_seed,
+                enable_multiscale,
+                multiscale_intermittent,
+                multiscale_factor,
+                multiscale_fullres_start,
+                multiscale_fullres_end,
+                keep_models_loaded,
+            ],
+            outputs=[gallery, status, metadata, preview_gallery],
+        )
+    return demo
+demo = _build_demo()
+demo.queue(default_concurrency_limit=1)
+if __name__ == "__main__":
+    demo.launch()

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,41 @@

+services:
+  lightdiffusion:
+    build:
+      context: .
+      dockerfile: Dockerfile
+      args:
+        # Specify target GPU architectures for CUDA extension builds
+        # 8.0: A100, 8.6: RTX 30xx, 8.9: RTX 40xx, 9.0: H100, 12.0: RTX 50xx (Blackwell)
+        # Customize based on your GPU: TORCH_CUDA_ARCH_LIST: "12.0" for RTX 50xx only
+        TORCH_CUDA_ARCH_LIST: "8.0;8.6;8.9;9.0;12.0"
+        INSTALL_STABLE_FAST: "0"
+        INSTALL_OLLAMA: "0"
+        INSTALL_SAGEATTENTION: "0"
+        INSTALL_SPARGEATTN: "0"
+    ports:
+      - "7860:7860"  # FastAPI backend serving the built React UI
+    volumes:
+      # Mount output directory to persist generated images
+      - ./output:/app/output
+      # Mount checkpoints directory for model files
+      - ./include/checkpoints:/app/include/checkpoints
+      # Mount other model directories
+      - ./include/loras:/app/include/loras
+      - ./include/embeddings:/app/include/embeddings
+      - ./include/ESRGAN:/app/include/ESRGAN
+      - ./include/yolos:/app/include/yolos
+    environment:
+      - PORT=7860
+      - CUDA_VISIBLE_DEVICES=0
+      - CUDA_HOME=/usr/local/cuda
+      - PROMPT_ENHANCER_MODEL=qwen3:0.6b
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [ gpu ]
+    restart: unless-stopped
+    stdin_open: true
+    tty: true

docker/README.md ADDED Viewed

	@@ -0,0 +1,46 @@

+# Docker Build Scripts
+This directory contains helper scripts used during the Docker image build process.
+## Files
+### patch_sageattention.py
+**Purpose**: Patches the SageAttention setup.py to support building without GPU present.
+**What it does**:
+- Adds support for the `TORCH_CUDA_ARCH_LIST` environment variable to SageAttention
+- Allows specifying target GPU architectures via environment variable
+- Enables building Docker images on machines without NVIDIA GPUs
+**Usage** (automatically called during Docker build):
+```bash
+cd SageAttention
+python3 ../docker/patch_sageattention.py
+```
+**Why it's needed**:
+SageAttention's original setup.py tries to detect GPU hardware during build time using `torch.cuda.device_count()`. This fails in Docker builds because:
+1. Docker builds don't have GPU access by default (even with `--gpus all`)
+2. GPU access during build is not guaranteed across all Docker configurations
+3. Build machines may not have the same GPU as the target runtime machine
+The patch adds a check for `TORCH_CUDA_ARCH_LIST` environment variable before attempting hardware detection, allowing explicit specification of target architectures.
+### sageattention_setup.patch (not used)
+Legacy patch file - kept for reference. The Python script approach is preferred.
+## How the Build Process Works
+1. **Environment Setup**: `TORCH_CUDA_ARCH_LIST` is set in Dockerfile via ARG/ENV
+2. **Patch Application**: `patch_sageattention.py` modifies SageAttention's setup.py
+3. **Extension Build**: Modified setup.py reads `TORCH_CUDA_ARCH_LIST` and compiles for specified architectures
+4. **SpargeAttn Build**: Already supports `TORCH_CUDA_ARCH_LIST` natively, no patch needed
+## Maintenance
+If SageAttention is updated, one may need to:
+1. Check if the patch still applies correctly
+2. Update the target line in `patch_sageattention.py` if the setup.py structure changes
+3. Test the build process with the new version
+The patch is designed to be non-intrusive and should work across most SageAttention versions that follow the same setup.py structure.

docker/patch_sageattention.py ADDED Viewed

	@@ -0,0 +1,49 @@

+#!/usr/bin/env python3
+"""
+Patch for SageAttention setup.py to support TORCH_CUDA_ARCH_LIST environment variable.
+This allows building without GPUs present during build time.
+"""
+import sys
+setup_py_path = "setup.py"
+# Read the original setup.py
+with open(setup_py_path, 'r') as f:
+    content = f.read()
+# Find the line where compute_capabilities is initialized
+target_line = "compute_capabilities = set()"
+if target_line not in content:
+    print("ERROR: Could not find target line in setup.py")
+    sys.exit(1)
+# Add our patch right after compute_capabilities initialization
+patch_code = '''
+# Check for TORCH_CUDA_ARCH_LIST environment variable first (Docker build support)
+env_arch_list = os.environ.get("TORCH_CUDA_ARCH_LIST", None)
+if env_arch_list:
+    print(f"Using TORCH_CUDA_ARCH_LIST from environment: {env_arch_list}")
+    arch_list = env_arch_list.replace(" ", ";").split(";")
+    for arch in arch_list:
+        arch = arch.strip()
+        if not arch:
+            continue
+        if arch.endswith("+PTX"):
+            arch = arch[:-4].strip()
+        if arch:
+            compute_capabilities.add(arch)
+'''
+# Insert the patch
+content = content.replace(
+    target_line,
+    target_line + patch_code
+)
+# Write back
+with open(setup_py_path, 'w') as f:
+    f.write(content)
+print("✓ Successfully patched setup.py to support TORCH_CUDA_ARCH_LIST")

docker/sageattention_setup.patch ADDED Viewed

	@@ -0,0 +1,24 @@

+--- setup.py.orig	2024-10-02 00:00:00.000000000 +0000
++++ setup.py	2024-10-02 00:00:00.000000000 +0000
+@@ -66,6 +66,17 @@
+     nvcc_cuda_version = parse(output[release_idx].split(",")[0])
+     return nvcc_cuda_version
++# Check for TORCH_CUDA_ARCH_LIST environment variable first
++import os
++env_arch_list = os.environ.get("TORCH_CUDA_ARCH_LIST", None)
++if env_arch_list:
++    print(f"Using TORCH_CUDA_ARCH_LIST from environment: {env_arch_list}")
++    arch_list = env_arch_list.replace(" ", ";").split(";")
++    for arch in arch_list:
++        arch = arch.strip()
++        if not arch:
++            continue
++        if arch.endswith("+PTX"):
++            arch = arch[:-4].strip()
++        if arch:
++            compute_capabilities.add(arch)
++
+ # Iterate over all GPUs on the current machine. Also you can modify this part to specify the architecture if you want to build for specific GPU architectures.
+ compute_capabilities = set()
+ device_count = torch.cuda.device_count()

docs/advanced-cfg-optimizations.md ADDED Viewed

	@@ -0,0 +1,262 @@

+# Advanced CFG Optimizations
+## Overview
+This document describes three advanced optimizations for Classifier-Free Guidance (CFG) that improve both quality and performance in LightDiffusion-Next:
+1. **Batched CFG Computation** - Speed optimization
+2. **Dynamic CFG Rescaling** - Quality optimization
+3. **Adaptive Noise Scheduling** - Quality & speed optimization
+## 1. Batched CFG Computation
+### What It Does
+Instead of running two separate forward passes for conditional and unconditional predictions, this optimization can combine them into a single batched forward pass.
+**Before:**
+```python
+# Two separate forward passes
+cond_pred = model(x, timestep, cond)      # Pass 1
+uncond_pred = model(x, timestep, uncond)  # Pass 2
+result = uncond_pred + cfg_scale * (cond_pred - uncond_pred)
+```
+**After:**
+```python
+# Single batched forward pass
+both_preds = model(x, timestep, [cond, uncond])  # Single pass
+cond_pred, uncond_pred = both_preds[0], both_preds[1]
+result = uncond_pred + cfg_scale * (cond_pred - uncond_pred)
+```
+### Performance Impact
+- **Speed**: ~1.8-2x faster CFG computation
+- **Memory**: Same or slightly less (batch processing)
+- **Quality**: Identical to baseline
+### Usage
+```python
+from src.sample import sampling
+samples = sampling.sample1(
+    model=model,
+    noise=noise,
+    steps=20,
+    cfg=7.5,
+    # ... other params ...
+    batched_cfg=True,  # Joint cond/uncond batching (default: True)
+)
+```
+In the current implementation, the heavy lifting still happens in the central conditioning packing path. `batched_cfg` controls whether conditional and unconditional branches are packed together into the same forward pass when possible. Conditioning chunks within each branch are still packed by the shared batching logic.
+### When to Use
+- **Usually recommended** - This reduces duplicate cond/uncond forward passes when memory allows
+- Particularly beneficial for high-resolution images or batch generation
+- Compatible with all samplers and schedulers
+---
+## 2. Dynamic CFG Rescaling
+### What It Does
+Dynamically adjusts the CFG scale based on prediction statistics to prevent over-saturation while maintaining prompt adherence.
+### The Problem
+High CFG values (7-12) improve prompt following but can cause:
+- Over-saturated colors
+- Over-sharpened edges ("halo effect")
+- Loss of fine details
+- Unnatural, "CG-like" appearance
+### The Solution
+Dynamic CFG rescaling analyzes the guidance vector (difference between conditional and unconditional predictions) and adjusts the CFG scale to keep it within an optimal range.
+**Two Methods:**
+#### Variance Method (Recommended)
+```python
+guidance_std = std(cond_pred - uncond_pred)
+adjusted_cfg = cfg_scale * (target_scale / (1 + guidance_std))
+```
+Best for: General use, prevents over-saturation
+#### Range Method
+```python
+guidance_range = percentile(guidance, 95) - percentile(guidance, 5)
+adjusted_cfg = cfg_scale * (target_scale / guidance_range)
+```
+Best for: Extreme cases, outlier filtering
+### Performance Impact
+- **Speed**: Minimal overhead (~2-5%)
+- **Quality**: Improved color balance, reduced artifacts
+- **Prompt Adherence**: Maintained or improved
+### Usage
+```python
+samples = sampling.sample1(
+    model=model,
+    # ... other params ...
+    dynamic_cfg_rescaling=True,        # Enable dynamic rescaling
+    dynamic_cfg_method="variance",     # Method: "variance" or "range"
+    dynamic_cfg_percentile=95,         # Percentile for range method
+    dynamic_cfg_target_scale=1.0,      # Target normalization scale
+)
+```
+### When to Use
+- High CFG values (>7.5)
+- Detailed prompts that might cause over-saturation
+- Photorealistic generations
+- Portraits and faces
+### When to Avoid
+- Very low CFG (<3.0) - minimal benefit
+- Artistic/stylized generations where saturation is desired
+- When using CFG-free sampling (already handles this differently)
+---
+## 3. Adaptive Noise Scheduling
+### What It Does
+Dynamically adjusts the noise schedule based on content complexity during generation.
+### The Problem
+Traditional fixed noise schedules apply the same denoising steps to all regions:
+- Complex scenes (detailed textures) may need more steps in certain regions
+- Simple scenes (smooth gradients) can use fewer steps
+- This wastes computation or undersamples complexity
+### The Solution
+Analyzes the complexity of intermediate predictions and adjusts subsequent noise levels accordingly.
+**Two Methods:**
+#### Complexity Method (Recommended)
+```python
+complexity = variance(denoised, spatial_dims)
+# High variance = complex details = maintain fine noise steps
+# Low variance = simple areas = can skip intermediate steps
+```
+Best for: General content-aware optimization
+#### Attention Method
+```python
+complexity = mean(|gradient(denoised)|)
+# High gradients = edges/details = need more precision
+# Low gradients = smooth areas = can denoise faster
+```
+Best for: Edge-focused content (architecture, technical drawings)
+### Performance Impact
+- **Speed**: 10-20% faster for simple scenes, same for complex
+- **Quality**: Adaptive - maintains quality where needed
+- **Prompt Adherence**: Unchanged
+### Usage
+```python
+samples = sampling.sample1(
+    model=model,
+    # ... other params ...
+    adaptive_noise_enabled=True,          # Enable adaptive scheduling
+    adaptive_noise_method="complexity",   # Method: "complexity" or "attention"
+)
+```
+### When to Use
+- Mixed complexity scenes (e.g., detailed subject + simple background)
+- Long sampling runs (50+ steps) - more opportunity to optimize
+- Batch generation with varying prompt complexity
+### When to Avoid
+- Very short sampling runs (<10 steps) - overhead > benefit
+- Uniformly complex scenes - no simplification possible
+- When exact step-by-step reproducibility is critical
+---
+## Combining Optimizations
+All three optimizations can be used together:
+```python
+samples = sampling.sample1(
+    model=model,
+    noise=noise,
+    steps=20,
+    cfg=7.5,
+    sampler_name="dpmpp_sde_cfgpp",
+    scheduler="ays",
+    positive=positive_cond,
+    negative=negative_cond,
+    latent_image=latent,
+    # All optimizations enabled
+    batched_cfg=True,
+    dynamic_cfg_rescaling=True,
+    dynamic_cfg_method="variance",
+    dynamic_cfg_target_scale=1.0,
+    adaptive_noise_enabled=True,
+    adaptive_noise_method="complexity",
+)
+```
+**Expected Results:**
+- Better color balance and detail preservation
+- Reduced over-saturation artifacts
+- Maintained or improved prompt adherence
+## Troubleshooting
+### Batched CFG Issues
+**Problem**: Memory errors with batched CFG
+**Solution**: System may not have enough VRAM for joint cond/uncond batching. Disable it with `batched_cfg=False`, which keeps the conditioning path active but runs the two branches separately.
+### Dynamic CFG Issues
+**Problem**: Images too flat/desaturated
+**Solution**: Increase `dynamic_cfg_target_scale` (try 1.5 or 2.0)
+**Problem**: Still over-saturated
+**Solution**: Switch to `dynamic_cfg_method="range"` and lower `dynamic_cfg_percentile`
+### Adaptive Noise Issues
+**Problem**: Inconsistent results
+**Solution**: Adaptive scheduling makes slight changes based on content. Disable for exact reproducibility.
+**Problem**: No speed improvement
+**Solution**: Works best with simple scenes. Complex scenes won't see speedup (but won't be slower either).
+---
+## Credits
+Implemented for LightDiffusion-Next by combining insights from:
+- CFG++ dynamic rescaling techniques
+- ComfyUI batched computation patterns
+- Stable Diffusion WebUI adaptive scheduling

docs/api.md ADDED Viewed

	@@ -0,0 +1,152 @@

+# REST API & Automation (Quick Reference)
+LightDiffusion-Next ships with a FastAPI service (`server.py`) that sits in front of the shared pipeline. It batches compatible requests, streams telemetry and exposes health probes so you can plug the system into automation workflows, bots or orchestrators.
+## Common endpoints
+| Method | Path | Description |
+| --- | --- | --- |
+| `GET` | `/health` | Lightweight readiness probe. Returns `{ "status": "ok" }` when the server is reachable. |
+| `GET` | `/api/telemetry` | Queue and VRAM telemetry: batching stats, pending requests, cache state, uptime. |
+| `POST` | `/api/generate` | Submit a generation job. Requests are buffered, batched when signatures match and resolved asynchronously. |
+The service listens on port `7861` by default. Launch it with:
+```fish
+uvicorn server:app --host 0.0.0.0 --port 7861
+```
+## Payload schema (`/api/generate`)
+```json
+{
+  "prompt": "string",
+  "negative_prompt": "string",
+  "width": 512,
+  "height": 512,
+  "num_images": 1,
+  "batch_size": 1,
+  "scheduler": "ays",
+  "sampler": "dpmpp_sde_cfgpp",
+  "steps": 20,
+  "hires_fix": false,
+  "adetailer": false,
+  "enhance_prompt": false,
+  "img2img_enabled": false,
+  "img2img_image": null,
+  "stable_fast": false,
+  "reuse_seed": false,
+  "flux_enabled": false,
+  "realistic_model": false,
+  "multiscale_enabled": true,
+  "multiscale_intermittent": true,
+  "multiscale_factor": 0.5,
+  "multiscale_fullres_start": 10,
+  "multiscale_fullres_end": 8,
+  "keep_models_loaded": true,
+  "enable_preview": false,
+  "preview_fidelity": "balanced",
+  "guidance_scale": null,
+  "seed": null
+}
+```
+Not all fields are required—only `prompt`, `width`, `height` and `num_images` are strictly necessary. Any unknown keys are ignored, making the endpoint forward-compatible with UI features.
+### Response format
+Successful requests return either:
+```json
+{ "image": "<base64-png>" }
+```
+or, if multiple images were requested:
+```json
+{ "images": ["<base64-png>", "<base64-png>"] }
+```
+Base64 strings represent PNG files with embedded metadata identical to the Streamlit UI output. Decode and write them to disk.
+### Img2Img uploads
+When `img2img_enabled` is `true`, `img2img_image` may be provided as any of the following:
+- A local file path (e.g., `"tests/test.png"`)
+- A data URL (e.g., `"data:image/png;base64,<...>"`)
+- A raw Base64-encoded PNG string
+The server will decode data URLs and raw Base64 strings and save them to the system temporary directory before processing (default max upload size: 10 MB). Keep payloads under a few megabytes to avoid HTTP timeouts.
+## Telemetry shape (`/api/telemetry`)
+The telemetry endpoint returns operational stats that help with autoscaling or queue dashboards. Example snippet:
+```json
+{
+  "uptime_seconds": 1234.56,
+  "pending_count": 2,
+  "pending_by_signature": {
+    "(False, 512, 512, True, False, False, True, True, 0.5, 10, 8, False, True, False)": 2
+  },
+  "pending_preview": [
+    {"request_id": "a1b2c3d4", "waiting_s": 0.42, "prompt_preview": "a cinematic robot..."}
+  ],
+  "max_batch_size": 4,
+  "max_images_per_group": 256,
+  "batch_timeout": 0.5,
+  "batches_processed": 12,
+  "items_processed": 24,
+  "requests_processed": 12,
+  "avg_processed_wait_s": 0.31,
+  "pending_avg_wait_s": 0.12,
+  "memory_info": {
+    "vram_allocated_mb": 5623,
+    "vram_reserved_mb": 6144,
+    "system_ram_mb": 12345
+  },
+  "loaded_models_count": 2,
+  "loaded_models": ["SD15 UNet", "SD15 VAE"],
+  "pipeline_import_ok": true,
+  "pipeline_import_error": null
+}
+```
+Use this data to spot batching mismatches (different signatures cannot be coalesced), monitor VRAM usage or expose metrics to Prometheus/Grafana.
+## Queue tuning knobs
+The queue accepts a few environment variables that influence behaviour:
+| Variable | Default | Effect |
+| --- | --- | --- |
+| `LD_MAX_BATCH_SIZE` | `4` | Maximum items processed together when signatures match. |
+| `LD_BATCH_TIMEOUT` | `0.5` | Seconds to wait before flushing a batch. |
+| `LD_BATCH_WAIT_SINGLETONS` | `0` | If `1`, single jobs wait the timeout hoping for companions. Set to `0` to process singletons immediately. |
+| `LD_MAX_IMAGES_PER_GROUP` | `256` | Maximum combined images processed in a single pipeline run when coalescing multiple requests. Groups larger than this are processed sequentially in smaller chunks to avoid memory and disk pressure. |
+| `LD_MAX_IMAGES_PER_SAVE` | `16` | Maximum images allowed in a single `save_images` call. If exceeded, the save is aborted to avoid creating many tile files; change with `LD_MAX_IMAGES_PER_SAVE` if needed. |
+| `LD_SERVER_LOGLEVEL` | `DEBUG` | Logging verbosity for `logs/server.log`. |
+## Deploying behind a reverse proxy
+When hosting remotely:
+- Front the FastAPI app with Nginx/Caddy and increase client body size if you accept Img2Img uploads.
+- Expose `/health` for liveness checks and `/api/telemetry` for readiness/autoscaling gates.
+- Mount `./include`, `./output` and `~/.cache/torch_extensions` as volumes so workers share models, outputs and compiled kernels.
+## Testing the service quickly
+```fish
+# Send a simple generation job
+curl -X POST http://localhost:7861/api/generate \
+  -H "Content-Type: application/json" \
+  -d '{"prompt": "painted nebula over distant mountains", "width": 512, "height": 512, "num_images": 1}' \
+  | jq -r '.image' | base64 -d > nebula.png
+# Inspect queue state
+curl http://localhost:7861/api/telemetry | jq
+```
+That’s it! Check the [Troubleshooting guide](quirks.md) if the service reports missing models or the queue appears stalled.

docs/architecture.md ADDED Viewed

	@@ -0,0 +1,73 @@

+# Architecture
+LightDiffusion-Next is split into three cooperating layers: UX surfaces, a FastAPI gateway and a modular inference core. Requests move through these layers, picking up metadata and transformations before image tensors ever touch the GPU. This page decomposes the system and highlights the extension points you are most likely to touch.
+## Layers in detail
+### UX layer (`streamlit_app.py`, `app.py`, `ui/*`)
+- Streamlit exposes rich controls, preset management and history in `ui/settings.py` and `ui/history.py`.
+- Gradio powers Spaces deployments (`app.py`). It streams previews via generators and mirrors the Streamlit control surface.
+- Both UIs instantiate a shared `AppInstance` which holds the pipeline, preview queues and cached settings.
+### FastAPI gateway (`server.py`)
+- Implements `/api/generate`, `/api/telemetry`, `/api/interrogate` and health probes.
+- `GenerationBuffer` batches jobs with compatible shapes, models and LoRA overlays to maximize GPU utilization.
+- Telemetry exposes queue lengths, average latency, VRAM usage and cached model fingerprints.
+- Server-side logging includes per-request identifiers and request tracebacks in `logs/server.log`.
+### Pipeline core (`src/user/pipeline.py`)
+This module orchestrates conditioning, diffusion, optional refinements and output serialization.
+- **Model resolution** — `src/FileManaging/Loader` locates checkpoints, VAE, CLIP weights and LoRAs. Stable-Fast backends live in `src/StableFast` and can be toggled in settings.
+- **Conditioning** — Prompts are tokenized through `src/cond/cond.py`. Negative prompts, style presets and textual inversion embeddings are applied here.
+- **Sampling** — `src/sample/sampling.KSampler` coordinates samplers (`ddim`, `dpmpp`, `k-diffusion`, etc.) with CFG++ and Flux schedulers.
+- **Enhancements** — Multi-scale diffusion (`multiscale_presets.py`), AutoDetailer (YOLO detection + inpainting), UltimateSDUpscale and AutoHDR run after the base diffusion loop.
+- **Outputs** — `src/FileManaging/ImageSaver` writes PNGs, JSON metadata and optionally sends frames to the preview queues.
+### Device and cache (`src/Device/ModelCache.py`)
+- Maintains reference-counted handles for UNet, VAE, CLIP and Flux components.
+- Handles VRAM telemetry and eviction policies so the UI can show “keep loaded” toggles without manual restarts.
+- Tracks whether Stable-Fast kernels, SageAttention or SD1.5 attention patches are initialized.
+### Asset management (`src/FileManaging/Downloader.py`)
+- Validates required checkpoints, VAE files, LoRAs, embeddings, YOLO detectors and Flux components at startup.
+- Supports mirrored download hosts and resumable transfers for large files.
+- Exposes helper methods used by the UI to fetch missing assets on demand.
+### Preview subsystem (`src/user/app_instance.py`)
+- Provides `get_latest_previews()` for UI clients, backed by a dedicated thread that consumes preview tensors straight from the pipeline.
+- Supports interrupt handling by setting `app_instance.interrupt = True`, which causes the sampler to exit gracefully.
+## Request lifecycle
+1. **Submission** — A UI or REST client creates a job payload containing prompts, dimensions, sampler settings, seed and post-processing flags.
+2. **Queueing & batching** — Jobs are inserted into `GenerationBuffer`. Depending on `LD_BATCH_WAIT_SINGLETONS`, single jobs may wait briefly for compatible companions to maximize GPU throughput.
+3. **Model preparation** — The pipeline loads or reuses cached models, applies LoRA deltas, textual inversion embeddings and optional quantization adapters (via `src/Quantize`).
+4. **Diffusion** — The sampler executes the denoising loop. Flux mode uses `src/BlackForest/Flux.py` for decoder steps; Stable-Fast kernels speed up SD1.5/SDXL.
+5. **Refinement** — Optional stages (HiRes Fix, AutoDetailer, AutoHDR, UltimateSDUpscale) run sequentially per sample.
+6. **Persistence** — Final images and metadata are written to `output/<workflow>/`. Streamlit previews receive running frames; REST clients receive base64 PNG payloads plus telemetry.
+## Filesystem overview
+- `include/checkpoints` — SD checkpoints (1.5, SDXL, Flux, etc.).
+- `include/loras`, `include/embeddings` — LoRA adapters and textual inversion concepts.
+- `include/clip` — Tokenizer and encoder configs.
+- `include/yolos` — Object detectors for AutoDetailer.
+- `include/ESRGAN` — Upscaler models for UltimateSDUpscale.
+- `output/*` — Organized galleries (Classic, Flux, Img2Img, Upscale, etc.).
+- `webui_settings.json` — Persisted Streamlit configuration.
+## Extending LightDiffusion-Next
+- **New samplers** — Implement in `src/sample/samplers.py` and register with `KSampler`. Add UI and REST switches via `ui/settings.py` and `GenerateRequest`.
+- **Additional post-processing** — Follow the pattern in `UltimateSDUpscale` or `AutoHDR` and register the stage near the end of `pipeline()`.
+- **Custom model managers** — Plug alternative download logic into `FileManaging/Downloader` or mount volumes in Docker deployments.
+- **Observability** — Add metrics/log statements in `GenerationBuffer` or extend `/api/telemetry` to fit orchestrator dashboards.
+Armed with this bird’s-eye view, you can dive into the [usage guide](usage.md) for operator workflows or the upcoming [API reference](api.md) for automation hooks.

docs/ays-scheduler.md ADDED Viewed

	@@ -0,0 +1,150 @@

+## 2. AYS (Align Your Steps) Scheduler
+### What It Does
+Uses optimized timestep distributions that allow **fewer sampling steps** with **same or better quality** compared to uniform schedulers.
+### Key Insight
+Not all timesteps contribute equally to image formation. AYS pre-computes optimal sigma schedules that focus more steps on critical noise levels.
+### Research Background
+Based on "Align Your Steps: Optimizing Sampling Schedules in Diffusion Models" (2024)
+- https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/
+- Developed by NVIDIA researchers
+- Validated across SD1.5, SDXL, and other models
+### Performance
+| Model | Normal Scheduler | AYS Scheduler | Quality |
+|-------|-----------------|---------------|---------|
+| SD1.5 | 20 steps | **10 steps** | Same/Better |
+| SDXL  | 20 steps | **10 steps** | Same/Better |
+| Flux  | 15 steps | **8 steps** | Same |
+### Usage
+#### Via UI (Streamlit)
+1. Open Settings → Sampling
+2. Select scheduler: "AYS (Align Your Steps)"
+3. Reduce steps to 10 (SD1.5/SDXL) or 8 (Flux)
+4. Generate - same quality, 2x faster!
+#### Programmatically
+```python
+from src.sample import ksampler_util
+# Using AYS scheduler
+sigmas = ksampler_util.calculate_sigmas(
+    model_sampling,
+    scheduler_name="ays",  # or "ays_sd15", "ays_sdxl", "ays_flux"
+    steps=10
+)
+```
+### Scheduler Variants
+- `"ays"` or `"ays_sd15"` - SD1.5 optimized (default)
+- `"ays_sdxl"` - SDXL optimized
+- `"ays_flux"` - Flux optimized (experimental)
+### Optimal Step Counts
+Pre-computed optimal schedules exist for:
+**SD1.5**: 4, 6, 8, 10, 12, 15, 20, 25 steps
+**SDXL**: 4, 6, 8, 10, 12, 15, 20 steps
+**Flux**: 4, 8, 10, 15, 20 steps
+Other step counts use interpolation (slightly less optimal but still better than uniform).
+### Recommended Settings
+#### SD1.5 Quick Generation
+```yaml
+scheduler: "ays"
+steps: 10          # instead of 20
+sampler: "euler" or "dpmpp_2m_cfgpp"
+cfg: 7.0
+```
+#### SDXL High Quality
+```yaml
+scheduler: "ays_sdxl"
+steps: 12          # instead of 20-25
+sampler: "dpmpp_2m_cfgpp"
+cfg: 6.0
+```
+#### Flux Fast Mode
+```yaml
+scheduler: "ays_flux"
+steps: 8           # instead of 15
+sampler: "euler"
+cfg: 3.5
+```
+### Comparison: Uniform vs AYS
+**Uniform Distribution (normal scheduler)**:
+```
+Steps: 0  4  8  12  16  20
+Sigmas evenly spaced → wastes compute on low-impact timesteps
+```
+**AYS Distribution**:
+```
+Steps: 0  2  5  8  12  17  20
+Sigmas concentrated on critical noise levels → better efficiency
+```
+### Technical Details
+AYS schedules are pre-computed using optimization to minimize reconstruction error:
+```python
+# Example SD1.5 10-step schedule
+AYS_SD15_10 = [
+    14.6146,  # High noise (early steps - image structure)
+    10.4708,
+    7.3688,
+    4.9651,   # Mid noise (detail formation)
+    3.2924,
+    2.1391,
+    1.3633,   # Low noise (fine details)
+    0.8437,
+    0.4898,
+    0.2279,
+    0.0       # Final step
+]
+```
+Compare to uniform schedule:
+```python
+# Normal scheduler @ 10 steps
+NORMAL_10 = [14.6146, 11.3, 8.7, 6.7, 5.1, 3.9, 3.0, 2.3, 1.7, 1.2, 0.0]
+# More evenly spaced → less efficient
+```
+### Troubleshooting
+**Q: Images look different with AYS?**
+A: Yes, they will differ slightly (different paths through noise space). Quality should be same or better. Adjust CFG if needed.
+**Q: AYS + multiscale?**
+A: Works great together! AYS optimizes step distribution, multiscale optimizes spatial resolution.
+**Q: Can I use AYS with euler_ancestral?**
+A: Yes! Works with all samplers (euler, euler_ancestral, dpmpp_2m_cfgpp, dpmpp_sde_cfgpp, etc.)
+**Q: How to verify it's active?**
+A: Check logs for "Using AYS optimal schedule" message.
+### References
+- Original paper: https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/
+- Implementation: `src/sample/ays_scheduler.py`
+- Integration: `src/sample/ksampler_util.py`

docs/cfg-free-sampling.md ADDED Viewed

	@@ -0,0 +1,269 @@

+# CFG-Free Sampling
+## Overview
+CFG-Free Sampling is a **quality optimization technique** that gradually reduces Classifier-Free Guidance (CFG) to zero during the final stages of image generation. This approach leverages the observation that high CFG strength is most beneficial early in the denoising process, while later steps benefit from reduced guidance for more natural, detailed outputs.
+By intelligently transitioning from high-guidance to low-guidance sampling, CFG-Free achieves:
+- **Improved fine detail** and texture quality
+- **More natural color saturation** and tonal balance
+- **Reduced artifacts** from over-guidance (halos, oversaturation, unnatural sharpness)
+- **Better prompt adherence** while maintaining photorealism
+This is a **training-free** technique that works with any sampler and can be combined with other optimizations.
+## How It Works
+### The CFG Problem
+Classifier-Free Guidance strengthens prompt adherence by amplifying the difference between conditional and unconditional predictions:
+$$
+\text{output} = \text{uncond\_pred} + \text{cfg\_scale} \times (\text{cond\_pred} - \text{uncond\_pred})
+$$
+**Benefits of high CFG (7-12):**
+- Strong prompt following
+- Clear compositional structure
+- Distinct subjects and backgrounds
+**Drawbacks of high CFG throughout generation:**
+- Over-sharpened edges ("halo effect")
+- Oversaturated colors
+- Loss of fine detail and texture
+- Unnatural, "CG-like" appearance
+- Potential anatomical distortions
+### The CFG-Free Solution
+Research shows that CFG importance varies by denoising stage:
+```
+┌─────────────────────────────────────────────────────────┐
+│ Early Steps (0-70%)                                     │
+│ High CFG is crucial:                                    │
+│   • Establishes composition                             │
+│   • Defines subject placement                           │
+│   • Interprets prompt semantics                         │
+│                                                         │
+│ CFG = 7.0 (user-configured)                             │
+└─────────────────────────────────────────────────────────┘
+                          ↓
+┌─────────────────────────────────────────────────────────┐
+│ Late Steps (70-100%)                                    │
+│ High CFG becomes detrimental:                           │
+│   • Composition already locked in                       │
+│   • Fine details being refined                          │
+│   • Oversaturation and artifacts emerge                 │
+│                                                         │
+│ CFG = 7.0 → 0.0 (linear reduction)                      │
+└─────────────────────────────────────────────────────────┘
+```
+CFG-Free gradually reduces guidance from your configured value (e.g., 7.0) to 0.0 over the final portion of generation. This preserves strong prompt adherence while allowing the model to naturally refine details without over-guidance.
+## Configuration
+### Parameters
+| Parameter | Type | Default | Range | Description |
+|-----------|------|---------|-------|-------------|
+| `cfg_free_enabled` | bool | `False` | - | Enable CFG-Free sampling |
+| `cfg_free_start_percent` | float | `70.0` | 0-100 | Percentage of steps at which to start reducing CFG |
+### How to Choose `cfg_free_start_percent`
+The optimal starting point depends on your aesthetic goals:
+| Start % | Behavior | Best For |
+|---------|----------|----------|
+| **60-65%** | Aggressive reduction, maximum detail preservation | Photorealistic portraits, product photography, architectural renders |
+| **70-75%** | Balanced approach (recommended) | General purpose, landscapes, character art, concept art |
+| **80-85%** | Conservative reduction, maintains stronger guidance | Abstract art, heavily stylized content, complex compositions |
+| **90%+** | Minimal effect, mostly for testing | Debugging, comparing with full-CFG baseline |
+**Rule of thumb:** Start with 70% for most use cases. If images appear oversaturated or have unnatural sharpness, lower it to 65%. If prompt adherence weakens, raise it to 75-80%.
+## Usage
+### Streamlit UI
+Enable in the **🎨 CFG-Free Sampling** expander:
+1. Check **Enable CFG-Free Sampling**
+2. Adjust the **Start Percentage** slider (0-100%, default: 70%)
+3. The info panel shows exactly when CFG reduction begins
+4. Generate images — you'll see console logging confirming activation
+**Visual feedback:**
+```
+✓ CFG-Free sampling ACTIVE: CFG will gradually reduce to 0 starting at 70% of steps
+```
+### REST API
+Include in your generation request:
+```bash
+curl -X POST http://localhost:7861/api/generate \
+  -H "Content-Type: application/json" \
+  -d '{
+        "prompt": "a portrait of a woman with flowing hair, soft lighting",
+        "negative_prompt": "blurry, low quality",
+        "width": 768,
+        "height": 1024,
+        "steps": 25,
+        "cfg_scale": 7.5,
+        "cfg_free_enabled": true,
+        "cfg_free_start_percent": 70.0
+      }'
+```
+### Python API
+```python
+from src.user.pipeline import pipeline
+pipeline(
+    prompt="a serene mountain landscape at sunset",
+    negative_prompt="blurry, distorted",
+    w=1024,
+    h=768,
+    steps=30,
+    sampler="dpmpp_sde_cfgpp",
+    scheduler="ays",
+    cfg_free_enabled=True,
+    cfg_free_start_percent=70.0,
+    number=1
+)
+```
+## Quality Impact Analysis
+### Visual Improvements
+CFG-Free sampling produces subtle but meaningful quality improvements:
+**Before (Standard CFG=7.5):**
+- Sharper edges, sometimes with halos
+- More saturated colors (can appear "painted")
+- Higher contrast, more dramatic lighting
+- Occasionally oversimplified textures
+**After (CFG-Free from 70%):**
+- Softer, more natural edge transitions
+- Improved color accuracy and tonal range
+- Better fine detail in hair, fabric, skin textures
+- More photorealistic lighting and shadow falloff
+- Reduced artifacts around high-contrast boundaries
+**Key insight:** Prompt adherence is determined in the first 60-70% of steps. Reducing CFG afterward doesn't weaken composition, it enhances natural detail refinement.
+## Troubleshooting
+### "Images look washed out or less vibrant"
+**Cause:** CFG-Free starting too early (e.g., 50-60%) can over-reduce guidance.
+**Solutions:**
+- Increase `cfg_free_start_percent` to 70-75%
+- Slightly increase base `cfg_scale` to 8.0-8.5
+- Use a different sampler (try `dpmpp_sde_cfgpp` or `dpmpp_2m_cfgpp`)
+### "No visible difference from standard CFG"
+**Cause:** Differences are subtle and may be masked by:
+- Very simple prompts (single subject, plain background)
+- Low resolution (<512px in any dimension)
+- Aggressive other optimizations obscuring quality gains
+**Solutions:**
+- Test with complex prompts (portraits, detailed scenes)
+- Use higher resolutions (768px+ recommended)
+- Generate comparison images side-by-side with CFG-Free on/off
+- Try lower `cfg_free_start_percent` (60-65%) for more noticeable effect
+### "Prompt adherence weakened"
+**Cause:** CFG-Free starting too early for your particular prompt complexity.
+**Solutions:**
+- Increase `cfg_free_start_percent` to 75-80%
+- Use stronger base `cfg_scale` (8.0-9.0)
+- Increase step count to 30-35 for better convergence
+## Technical Details
+### Implementation
+CFG-Free is implemented in the `CFGGuider` class (`src/sample/CFG.py`):
+```python
+def _update_cfg_for_sigma(self, sigma):
+    """Update CFG value based on current sigma and CFG-free parameters."""
+    if not self.cfg_free_enabled:
+        return
+    # Find current step position in schedule
+    current_step = find_closest_sigma_index(sigma, self.sigmas)
+    total_steps = len(self.sigmas) - 1
+    progress_percent = (current_step / total_steps) * 100.0
+    if progress_percent >= self.cfg_free_start_percent:
+        # Linear interpolation from original CFG to 0
+        cfg_free_progress = (
+            (progress_percent - self.cfg_free_start_percent) /
+            (100.0 - self.cfg_free_start_percent)
+        )
+        self.cfg = self.original_cfg * (1.0 - cfg_free_progress)
+        self.cfg = max(0.0, self.cfg)  # Clamp to [0, original_cfg]
+```
+**Schedule visualization:**
+```
+CFG Scaling Over Time (start_percent=70%, original_cfg=7.5)
+Step:  0    5    10   15   20   25   30   35   40   45   50
+       ┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
+CFG:   7.5  7.5  7.5  7.5  7.5  7.5  7.5  5.6  3.8  1.9  0.0
+       │■■■■■■■■■■■■■■■■■■■■■■■■│▓▓▓▓▓▓▓▓▓▓│░░░░│    │    │
+       └────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
+       ←──── Full CFG ────────→←─── Gradual Reduction ───→
+```
+### Mathematical Formulation
+Standard CFG at every step:
+$$
+\mathbf{x}_{t-1} = \mathbf{x}_t + \text{cfg\_scale} \times (\mathbf{cond} - \mathbf{uncond})
+$$
+CFG-Free with schedule:
+$$
+\mathbf{x}_{t-1} = \mathbf{x}_t + \text{cfg}(t) \times (\mathbf{cond} - \mathbf{uncond})
+$$
+Where:
+$$
+\text{cfg}(t) = \begin{cases}
+\text{cfg\_scale} & \text{if } t < t_{\text{start}} \\
+\text{cfg\_scale} \times \left(1 - \frac{t - t_{\text{start}}}{t_{\text{total}} - t_{\text{start}}}\right) & \text{if } t \geq t_{\text{start}}
+\end{cases}
+$$
+## Related Optimizations
+- **[CFG++ Samplers](optimizations.md#cfg-samplers)**: Advanced CFG implementation with momentum and multi-scale — CFG-Free complements these
+- **[Multi-Scale Diffusion](optimizations.md#multi-scale)**: Resolution-based optimization — works independently of CFG-Free
+- **[DeepCache](wavespeed.md#deepcache)**: Feature caching for speedup — no quality interaction with CFG-Free
+## References & Further Reading
+- Original research: CFG-Free sampling builds on insights from [Classifier-Free Guidance](https://arxiv.org/abs/2207.12598) (Ho & Salimans, 2022)
+- Implementation inspired by community experiments with dynamic CFG schedules
+- Mathematical framework adapted from diffusion model literature

docs/contributing.md ADDED Viewed

	@@ -0,0 +1,94 @@

+# Contributing
+Thanks for helping push LightDiffusion-Next forward! This project blends a Streamlit UI, a Gradio deployment surface, a FastAPI queue and a sizeable inference stack. The guidelines below should get you productive quickly.
+## Getting your environment ready
+### Prerequisites
+- Python 3.10 (the bundled wheels and Stable-Fast extension are built against 3.10)
+- NVIDIA GPU with CUDA 12.1+ drivers (for GPU development)
+- Git (with LFS if you plan to version large model weights)
+- `uv` or `pip` for dependency management
+- Optional: Docker + NVIDIA Container Toolkit for containerized testing
+### Clone & install
+```fish
+git clone https://github.com/Aatricks/LightDiffusion-Next.git
+cd LightDiffusion-Next
+# Recommended: isolate dependencies
+python -m venv .venv
+source .venv/bin/activate
+# Install runtime dependencies
+uv pip install -r requirements.txt
+# (Optional) Extras for docs and linting
+uv pip install mkdocs mkdocs-material mkdocstrings-python ruff black
+```
+Populate `include/` with the checkpoints you need (SD1.5, Flux, LoRAs, embeddings). The UI will prompt you for missing assets if you skip this step.
+## Running the apps locally
+- **Streamlit UI**: `streamlit run streamlit_app.py`
+- **Gradio UI**: `python app.py`
+- **FastAPI backend**: `uvicorn server:app --host 0.0.0.0 --port 7861`
+All services read the same configuration and model directories. When working on the pipeline, it’s handy to keep FastAPI running for quick REST smoke tests while you iterate on the UI in a separate terminal.
+## Workflow expectations
+1. Create a branch per piece of work: `git checkout -b feature/short-summary`.
+2. Keep pull requests focused—avoid bundling unrelated refactors with feature work.
+3. Reference issues in your commit messages and PR description when applicable.
+4. Update documentation (`docs/`, `README.md`) whenever behavior, defaults or environments change.
+## Coding standards
+- Follow PEP 8 for Python. If you have `ruff` or `black` installed, run them before committing (`ruff check src ui` and `black src ui`).
+- Prefer type hints for new modules; FastAPI schemas and pipeline helpers already use Pydantic models you can extend.
+- Favor dependency injection over global state—pass configuration into functions where feasible so the FastAPI worker and Streamlit UI stay in sync.
+- When touching CUDA or kernel build logic, document the change in `docs/quirks.md` or `docs/installation.md` so operators know about new requirements.
+## Verification checklist
+Before opening a pull request:
+- [ ] `streamlit run streamlit_app.py` starts without stack traces.
+- [ ] `uvicorn server:app --host 0.0.0.0 --port 7861` accepts at least one `/api/generate` call (you can use the example payload in [API docs](api.md)).
+- [ ] `python app.py` (Gradio) loads when relevant to your change.
+- [ ] `mkdocs build` succeeds (documentation stays green).
+- [ ] GPU-specific changes are tested on at least one real GPU and noted in the PR description.
+- [ ] No large binaries or secrets are committed—place models inside `include/` and gitignore keeps them local.
+If you add scripts or automation, include instructions in `docs/examples.md` or a new page and wire it into `mkdocs.yml`.
+## Submitting your PR
+- Fill out a concise description covering **what changed**, **why**, and any ops impact (new env vars, caches, etc.).
+- Attach screenshots or sample renders when altering the UI or pipeline defaults.
+- Expect friendly but thorough reviews—batching, caching and GPU tweaks affect many users, so be ready to iterate.
+- Squash-merge is fine, but avoid force-pushing after reviews unless you coordinate with the maintainer.
+## Bug reports & feature requests
+When reporting an issue, please include:
+- Operating system, driver versions (`nvidia-smi` output), GPU model
+- How you launched LightDiffusion-Next (Streamlit, Docker, FastAPI)
+- Relevant logs (`logs/server.log`, Streamlit terminal output, `/api/telemetry` response)
+- Steps to reproduce and whether the problem is reproducible on a fresh checkout
+Feature ideas are welcome—outline the use case, expected UX and any new dependencies (models, GPU requirements). Discussions and prototypes in separate branches make reviews easier.
+## Documentation contributions
+- Run `mkdocs serve` while editing to preview changes at http://127.0.0.1:8080.
+- Add new pages under `docs/` and update `mkdocs.yml` navigation.
+- Screenshots should be optimized PNGs or WebPs stored under `docs/images/`.
+- Keep `README.md` focused on quick start—you can link to richer docs pages for details.
+Thanks again for contributing! 🚀

docs/examples.md ADDED Viewed

	@@ -0,0 +1,143 @@

+# Recipes & Workflows
+This page collects practical “recipes” for common LightDiffusion-Next scenarios. Each section lists the UI path, optional CLI equivalents and tips for squeezing the best quality or performance out of the pipeline.
+## 1. Classic text-to-image (SD1.5)
+Steps in the Streamlit UI:
+1. Enter a prompt such as `a cozy reading nook lit by neon signs, cinematic lighting, ultra detailed`.
+2. Leave negative prompt empty to use the curated default (includes `EasyNegative` and `badhandv4`).
+3. Set width and height to `768 × 512` and request `4` images with a batch size of `2`.
+4. Enable **Keep models in VRAM** for faster iteration while exploring.
+5. (Optional) Toggle **Enhance prompt** if you have Ollama running.
+6. Click **Generate** — watch the TAESD previews update in real time.
+CLI equivalent:
+```bash
+python -m src.user.pipeline "a cozy reading nook lit by neon signs" 768 512 4 2 --stable-fast --reuse-seed
+```
+Tips:
+- For softer lighting turn on **AutoHDR** (enabled by default) and lower CFG to 6.5 using the advanced settings drawer.
+- Combine with **LoRA** adapters by placing `.safetensors` files in `include/loras/` and selecting them in the UI dropdown.
+## 2. Flux workflow
+Flux requires the quantized GGUF UNet, CLIP, T5 weights and the schnELL VAE (`include/vae/ae.safetensors`). The first run downloads them automatically.
+1. Toggle **Flux mode**.
+2. Switch CFG to `1.0` (Flux expects low CFG) and set steps to around 20.
+3. Provide a natural language prompt such as `a charcoal sketch of a train arriving at midnight, expressive strokes`.
+4. Generate 2 images with batch size 1.
+REST API example:
+```bash
+curl -X POST http://localhost:7861/api/generate \
+  -H "Content-Type: application/json" \
+  -d '{
+        "prompt": "a charcoal sketch of a train arriving at midnight, expressive strokes",
+        "width": 832,
+        "height": 1216,
+        "num_images": 2,
+        "flux_enabled": true,
+        "keep_models_loaded": true
+      }' | jq '.images[0]' -r | base64 -d > flux.png
+```
+Tips:
+- Flux ignores negative prompts and uses natural language weighting. Seed reuse works the same way as SD1.5.
+- Monitor GPU memory in the **Model Cache Management** accordion — Flux models are larger.
+## 3. HiRes Fix + ADetailer portrait
+1. Choose a prompt such as `portrait of a cyberpunk detective, glowing tattoos, rain-soaked alley`.
+2. Set `width = 640`, `height = 896`, **num images = 1**.
+3. Enable **HiRes Fix**, **ADetailer** and **Stable-Fast**.
+4. In the advanced section set **HiRes denoise** to ~0.45 by editing `config.toml` (or accept the default and adjust later).
+5. Generate — the pipeline saves the base render, body detail pass and head detail pass separately.
+Where to find outputs:
+- Base image: `output/HiresFix/`.
+- Body/head detail passes: `output/Adetailer/`.
+Tips:
+- Provide a short negative prompt that removes “extra limbs” to guide the detector.
+- Use the **History** tab to compare detailer versus base results quickly.
+## 4. Img2Img upscaling with Ultimate SD Upscale
+1. Enable **Img2Img mode** and upload your reference image.
+2. Set denoise strength via the slider in the Img2Img accordion (`0.3` is a good starting point).
+3. Toggle **Stable-Fast** for faster tile processing and keep CFG around 6.
+4. Generate. UltimateSDUpscale will split the image into tiles, run targeted refinement and apply RealESRGAN (`include/ESRGAN/RealESRGAN_x4plus.pth`).
+Tips:
+- For stylized upscales change the prompt between passes — the pipeline will regenerate details without overwriting the original.
+- Outputs land in `output/Img2Img/` with metadata including seam-fixing parameters.
+## 5. Automated batch via REST API
+Use the FastAPI backend when you need to process multiple prompts from scripts or a Discord bot.
+```python
+import base64
+import json
+import requests
+payload = {
+    "prompt": "sunrise over a foggy fjord, volumetric light, ethereal",
+    "negative_prompt": "low quality, blurry",
+    "width": 832,
+    "height": 512,
+    "num_images": 3,
+    "batch_size": 3,
+    "stable_fast": True,
+    "reuse_seed": False,
+    "enable_preview": False
+}
+resp = requests.post("http://localhost:7861/api/generate", json=payload)
+resp.raise_for_status()
+images = resp.json().get("images", [])
+for idx, b64_img in enumerate(images):
+    with open(f"fjord_{idx+1}.png", "wb") as f:
+        f.write(base64.b64decode(b64_img))
+```
+The queue automatically coalesces compatible requests to maximize GPU utilization. Check `/api/telemetry` for batching statistics and memory usage.
+## 6. Discord bot bridge
+Combine LightDiffusion-Next with the [Boubou](https://github.com/Aatrick/Boubou) Discord bot:
+1. Follow the bot’s README to set your Discord token and install `py-cord` inside the LightDiffusion environment.
+2. Point the bot’s configuration at the FastAPI endpoint (`http://localhost:7861`).
+3. Give the bot `Send Messages` and `Attach Files` permissions.
+4. Use commands such as `/ld prompt:"a watercolor koi pond"` from your server and watch images stream back into the channel.
+## 7. Prompt enhancer playground
+1. Install [Ollama](https://ollama.com/) and run `ollama serve` in another terminal.
+2. Pull the suggested model:
+   ```bash
+   ollama pull qwen3:0.6b
+   ```
+3. Export the model name before launching the UI:
+   ```bash
+   export PROMPT_ENHANCER_MODEL=qwen3:0.6b
+   ```
+4. Enable **Enhance prompt** in Streamlit and inspect the rewritten prompt under the preview section. The original text is still stored as `original_prompt` inside PNG metadata.
+Continue exploring by reading the [performance & tuning](quirks.md) guide or the [REST documentation](api.md) for full endpoint details.

docs/faq.md ADDED Viewed

	@@ -0,0 +1,68 @@

+# FAQ
+Q: Where do I put my checkpoints?
+A: Put them in `include/checkpoints` (create the folder if missing). The UI and `src/FileManaging/Loader` will detect and list them.
+Q: Why is GPU memory insufficient?
+A: Try reducing `width`/`height`, turning off `keep models loaded`, or enable quantized Flux/GGUF models. See [Performance & Troubleshooting](quirks.md).
+Q: Can I run headless on a server?
+A: Yes — use the FastAPI backend with `docker-compose` or run `server.py` directly. Disable Streamlit if you don’t need the web UI.
+Q: How do I contribute models or LoRAs?
+A: Place LoRA files in `include/loras` and embeddings in `include/embeddings`. See [Contributing](contributing.md) for guidelines.
+/// details | Which diffusion models are supported out of the box?
+LightDiffusion-Next ships with Stable Diffusion 1.5-friendly defaults and includes helpers for SDXL-inspired checkpoints, Flux (via the `include/Flux` assets) and quantized Stable-Fast backends. Drop your `.safetensors` or `.ckpt` files into `include/checkpoints`, LoRAs into `include/loras`, embeddings into `include/embeddings`, and Flux weights into `include/Flux`. The loader auto-detects formats and will prompt for missing companions (VAE, CLIP) at startup.
+///
+/// details | What GPU and driver versions do I need?
+NVIDIA GPUs with CUDA 12.1+ drivers are recommended. Availability of Stable-Fast, SageAttention and SpargeAttn depends on your installed kernels, drivers and GPU compute capability — the runtime detects and enables compatible backends automatically. For Docker, install the NVIDIA Container Toolkit and verify `nvidia-smi` works inside the container.
+///
+/// details | Can I run LightDiffusion-Next without a GPU?
+Yes, but performance will be limited. Install CPU wheels of PyTorch or rely on the bundled Intel oneAPI runtime (Linux only). Disable Stable-Fast/SageAttention in settings, reduce resolution (≤384×384), lower steps (<20) and turn off AutoDetailer/HiResFix to avoid minute-long renders.
+///
+/// details | Where do generated images and metadata live?
+Outputs are grouped by workflow under `output/`. For example, standard Txt2Img lands in `output/classic`, HiresFix into `output/HiresFix`, Flux into `output/Flux`, Img2Img upscales into `output/Img2Img`, etc. Each PNG embeds prompt metadata; accompanying JSON manifests are saved when enabled in settings.
+///
+/// details | How do I switch between Streamlit, Gradio and the API?
+Use the launch scripts:
+- `streamlit run streamlit_app.py` (default UI)
+- `python app.py` (Gradio app for Spaces/remote hosting)
+- `uvicorn server:app --host 0.0.0.0 --port 7861` (FastAPI)
+All three share the same pipeline and config. Streamlit/Gradio speak directly to the pipeline, while the API feeds the batching queue in `server.py`.
+///
+/// details | How do I enable Stable-Fast or SageAttention?
+In Streamlit, toggle **Stable-Fast** under *Performance*. The app will compile kernels the first time and reuse them afterwards (cache in `~/.cache/torch_extensions`). SageAttention is enabled automatically on supported GPUs; you can force-disable it by setting `LD_DISABLE_SAGE_ATTENTION=1` before launching. Docker images already ship with the patched kernels compiled.
+///
+/// details | What if the app says a model is missing?
+The downloader checks `include/` on startup and whenever a feature needs a new asset (YOLO, Flux, TAESD). Provide URLs or Hugging Face tokens when prompted, or pre-populate the folders manually. For offline environments, copy the files into the correct directories and ensure filenames match the expected suffixes (e.g., `anything-v4.5-pruned.safetensors`).
+///
+/// details | Can I enhance prompts automatically with Ollama?
+Yes. Install Ollama locally, download a language model (`ollama run mistral`), then enable **Prompt Enhancer** in the UI or set `enhance_prompt=true` in the REST payload. Set `OLLAMA_BASE_URL` if Ollama is not on `http://localhost:11434`.
+///
+/// details | How do I reset persistent settings or history?
+Delete `webui_settings.json` in the project root to reset saved toggles and defaults. Remove individual history directories under `ui/history/` to clear the UI gallery without touching generated images.
+///
+/// details | Need more help?
+Check the [Troubleshooting guide](quirks.md) or [open an issue](https://github.com/Aatricks/LightDiffusion-Next/issues) with logs, hardware specs and steps to reproduce.
+///

docs/implemented-optimizations-report.md ADDED Viewed

	@@ -0,0 +1,484 @@

+# Implemented Optimizations Report
+This document presents a source-based engineering report on the optimization stack used across generation, model loading, and serving in LightDiffusion-Next.
+Unlike the overview pages:
+- The source tree is treated as the primary reference point.
+- Each optimization is described in terms of purpose, implementation, integration, and trade-offs.
+- Supporting infrastructure and codebase groundwork are included when they materially contribute to the performance profile of the project.
+## Report Scope
+### Usage Profile Definitions
+- `default`: selected in the standard execution path
+- `integrated`: part of the current generation or serving flow
+- `optional`: integrated, but enabled through request settings, configuration, or model capabilities
+- `conditional`: available when hardware, dependencies, or runtime capabilities allow it
+- `implementation-specific`: implemented and used, but its effective behavior is shaped by a narrower internal path than the request surface alone suggests
+- `infrastructure-level`: supports the fast path indirectly through loading, transfer, caching, or serving behavior
+- `codebase groundwork`: implemented in the codebase as part of the optimization stack, but not yet surfaced as a broad standard pipeline option
+### What This Report Covers
+This report covers both model-level and system-level optimizations:
+- inference and sampling speedups
+- precision and memory reductions
+- request batching and pipeline throughput improvements
+- preview and output-path latency reductions
+It does not catalog ordinary features unless they clearly reduce compute, memory, or end-to-end latency.
+## Quick Inventory
+| Optimization | Usage Profile | Main Goal | Primary Evidence |
+|---|---|---|---|
+| CUDA runtime tuning (TF32, cuDNN benchmark, SDPA enablement) | integrated, conditional | faster kernels and better backend selection | `src/Device/Device.py` |
+| Attention backend cascade (SpargeAttn/SageAttention/xformers/SDPA) | integrated, conditional | faster attention kernels with fallback | `src/Attention/Attention.py`, `src/Attention/AttentionMethods.py` |
+| Flux2 SDPA backend priority | integrated, conditional | prefer cuDNN/Flash SDPA for Flux2 attention | `src/NeuralNetwork/flux2/layers.py`, `src/Device/Device.py` |
+| Cross-attention K/V projection cache | integrated | skip repeated key/value projection work for static context | `src/Attention/Attention.py` |
+| Prompt embedding cache | integrated | avoid re-encoding repeated prompts | `src/Utilities/prompt_cache.py`, `src/clip/Clip.py` |
+| Conditioning batch packing and memory-aware concatenation | integrated | reduce forward passes and pack compatible condition chunks | `src/cond/cond.py` |
+| CFG=1 unconditional-skip fast path | integrated | skip unnecessary unconditional branch at CFG 1.0 | `src/sample/CFG.py`, `src/sample/BaseSampler.py` |
+| AYS scheduler | default | reach similar quality in fewer steps | `src/sample/ays_scheduler.py`, `src/sample/ksampler_util.py` |
+| CFG++ samplers | integrated | improve denoising behavior with momentum-style correction | `src/sample/BaseSampler.py` |
+| CFG-Free sampling | integrated, optional | taper CFG late in sampling for better detail/naturalness | `src/sample/CFG.py` |
+| Dynamic CFG rescaling | integrated, optional | reduce overshoot and saturation from strong CFG | `src/sample/CFG.py` |
+| Adaptive noise scheduling | integrated, optional | adjust schedule based on observed complexity | `src/sample/CFG.py` |
+| `batched_cfg` request surface | implementation-specific | request-facing control around the deeper conditioning batching path | `src/sample/sampling.py`, `src/cond/cond.py` |
+| Multi-scale latent switching | integrated, optional | do some denoising at reduced spatial resolution | `src/sample/BaseSampler.py` |
+| HiDiffusion MSW-MSA patching | integrated, optional | patch UNet attention for high-resolution multiscale workflows | `src/Core/Pipeline.py`, `src/hidiffusion/msw_msa_attention.py` |
+| Stable-Fast | integrated, conditional | trace/compile UNet forward path | `src/StableFast/StableFast.py`, `src/Core/Pipeline.py` |
+| `torch.compile` | integrated, optional | compiler-based model speedup without Stable-Fast | `src/Device/Device.py`, `src/Core/AbstractModel.py` |
+| VAE compile, tiled path, and transfer tuning | integrated | speed up decode/encode and avoid OOM | `src/AutoEncoders/VariationalAE.py` |
+| BF16/FP16 automatic dtype selection | integrated, conditional | reduce memory and improve throughput on supported hardware | `src/Device/Device.py` |
+| FP8 weight quantization | integrated, conditional | reduce weight memory and enable Flux2-friendly inference paths | `src/Core/AbstractModel.py`, `src/Model/ModelPatcher.py` |
+| NVFP4 weight quantization | integrated, optional | stronger memory reduction than FP8 | `src/Core/AbstractModel.py`, `src/Model/ModelPatcher.py`, `src/Utilities/Quantization.py` |
+| Flux2 load-time weight-only quantization | integrated, conditional | keep large Flux2/Klein components workable on smaller VRAM budgets | `src/Core/Models/Flux2KleinModel.py` |
+| ToMe | integrated, optional | reduce attention cost by token merging on UNet models | `src/Model/ModelPatcher.py`, `src/Core/Pipeline.py` |
+| DeepCache | integrated, optional, implementation-specific | reuse prior denoiser output between update steps | `src/WaveSpeed/deepcache_nodes.py`, `src/Core/Pipeline.py` |
+| First Block Cache for Flux | codebase groundwork | cache transformer work for Flux-like models | `src/WaveSpeed/first_block_cache.py` |
+| Low-VRAM partial loading and offload policy | integrated | load only what fits and offload the rest | `src/cond/cond_util.py`, `src/Device/Device.py`, `src/Model/ModelPatcher.py` |
+| Async transfer helpers and pinned checkpoint tensors | integrated, infrastructure-level | reduce host/device transfer overhead | `src/Device/Device.py`, `src/Utilities/util.py` |
+| Request coalescing and queue batching | integrated | increase throughput across compatible API requests | `server.py` |
+| Large-group chunking and image-save guardrails | integrated | keep large coalesced runs from blowing up save/decode paths | `server.py`, `src/FileManaging/ImageSaver.py` |
+| Next-model prefetch | integrated | hide future checkpoint load latency | `server.py`, `src/Device/ModelCache.py`, `src/Utilities/util.py` |
+| Keep-models-loaded cache | integrated | reuse loaded checkpoints and reduce warm starts | `src/Device/ModelCache.py`, `server.py` |
+| In-memory PNG byte buffer | integrated | avoid disk round-trip for API responses | `src/FileManaging/ImageSaver.py`, `server.py` |
+| TAESD preview pacing and preview fidelity control | integrated, conditional | reduce preview overhead while keeping live feedback usable | `src/sample/BaseSampler.py`, `src/AutoEncoders/taesd.py`, `server.py` |
+## Executive Summary
+The optimization strategy in LightDiffusion-Next is layered and cumulative rather than dependent on a single acceleration mechanism.
+1. The core generation path combines runtime kernel selection, conditioning batching, lower-precision execution, and schedule optimization.
+2. Several optimizations are part of the standard execution path, most notably AYS scheduling, prompt caching, attention backend selection, low-VRAM loading policy, and server-side request grouping.
+3. A second layer of optional mechanisms provides workload-specific extensions, including Stable-Fast, `torch.compile`, ToMe, multiscale sampling, quantization, and guidance refinements such as CFG-Free and dynamic rescaling.
+4. The serving layer contributes materially to end-to-end throughput and latency through request coalescing, chunking, model prefetching, keep-loaded caching, and in-memory response handling.
+5. The codebase also contains foundational work for additional caching paths, particularly around Flux-oriented first-block caching, alongside the currently integrated DeepCache path.
+## Runtime And Attention Optimizations
+### CUDA runtime tuning
+- Status: `integrated, conditional`
+- Purpose: use faster math modes and let the backend choose more aggressive convolution and attention kernels.
+- Implementation in LightDiffusion-Next: `src/Device/Device.py` enables TF32 (`torch.backends.cuda.matmul.allow_tf32`, `torch.backends.cudnn.allow_tf32`), enables cuDNN benchmarking, and turns on PyTorch math/flash/memory-efficient SDPA when available.
+- Project integration: these are process-wide defaults. They do not require per-request toggles, so supported CUDA deployments get them automatically.
+- Effect: reduces matmul/convolution cost and opens better SDPA backends with no extra application-layer work.
+- Benefits: automatic, broad coverage, low complexity.
+- Trade-offs: hardware-conditional; benefits depend on GPU generation and PyTorch build.
+- Evidence: `src/Device/Device.py`.
+### Attention backend cascade: SpargeAttn, SageAttention, xformers, PyTorch SDPA
+- Status: `integrated, conditional`
+- Purpose: use the fastest available attention kernel and fall back safely when unsupported.
+- Implementation in LightDiffusion-Next: UNet/VAE attention chooses `SpargeAttn > SageAttention > xformers > PyTorch` in `src/Attention/Attention.py`; the concrete kernels and fallback behavior live in `src/Attention/AttentionMethods.py`.
+- Project integration: the selection happens once when the attention module is imported/constructed. Sage/Sparge paths reshape inputs to HND layouts and pad unsupported head sizes to supported dimensions where possible; larger unsupported head sizes fall back.
+- Effect: faster attention on supported CUDA systems without changing calling code.
+- Benefits: automatic fallback chain, works across UNet cross-attention and VAE attention blocks, handles padding for awkward head sizes.
+- Trade-offs: dependency- and GPU-dependent; not all head sizes stay on the fast path; behavior differs between generic UNet/VAE attention and Flux2 attention.
+- Evidence: `src/Attention/Attention.py`, `src/Attention/AttentionMethods.py`.
+### Flux2 SDPA backend priority
+- Status: `integrated, conditional`
+- Purpose: prefer the best PyTorch SDPA backend for Flux2 transformer attention.
+- Implementation in LightDiffusion-Next: `src/Device/Device.py` builds an SDPA priority context preferring cuDNN attention, then Flash, then efficient, then math; `src/NeuralNetwork/flux2/layers.py` uses `Device.get_sdpa_context()` around `scaled_dot_product_attention`.
+- Project integration: Flux2 uses a separate attention implementation from the generic UNet attention path. It first tries prioritized SDPA, then xformers, then plain SDPA.
+- Effect: prioritized fast attention for Flux2 with robust fallback behavior.
+- Benefits: keeps Flux2 on the most optimized native backend available; does not require custom kernels.
+- Trade-offs: benefits depend heavily on PyTorch version, backend support, and GPU runtime.
+- Evidence: `src/Device/Device.py`, `src/NeuralNetwork/flux2/layers.py`.
+### Cross-attention static K/V projection cache
+- Status: `integrated`
+- Purpose: when the context tensor is unchanged across denoising steps, avoid recomputing K/V projections every step.
+- Implementation in LightDiffusion-Next: `CrossAttention` in `src/Attention/Attention.py` keeps a small `_context_cache` keyed by `id(context)` and caches projected `k` and `v`.
+- Project integration: this primarily targets prompt-conditioning cases where context is static while the latent evolves. The cache is tiny and self-pruning.
+- Effect: shaves repeated linear-projection work from cross-attention-heavy denoising loops.
+- Benefits: simple, training-free, no user configuration.
+- Trade-offs: keyed by object identity, so it only helps when the exact context object is reused; small cache size limits reuse breadth.
+- Evidence: `src/Attention/Attention.py`.
+### Prompt embedding cache
+- Status: `integrated`
+- Purpose: cache text encoder outputs for repeated prompts instead of re-encoding them each time.
+- Implementation in LightDiffusion-Next: `src/Utilities/prompt_cache.py` stores `(cond, pooled)` entries keyed by prompt hash and CLIP identity; `src/clip/Clip.py` checks the cache before tokenization/encoding and writes back after encode.
+- Project integration: prompt caching is globally enabled by default, applies to single prompts and prompt lists, and prunes old entries once the cache exceeds its configured maximum.
+- Effect: reduces prompt-side overhead in repeated-prompt workflows, especially seed sweeps and incremental prompt refinement.
+- Benefits: low complexity, wired into the actual CLIP encode path, no quality trade-off.
+- Trade-offs: cache size is estimate-based and global, not per-model-session aware.
+- Evidence: `src/Utilities/prompt_cache.py`, `src/clip/Clip.py`, cache clear hook in `src/Core/Pipeline.py`.
+### Conditioning batch packing and CFG=1 fast path
+- Status: `integrated`
+- Purpose: concatenate compatible conditioning work into fewer forward calls, and skip unconditional work entirely when CFG is effectively disabled.
+- Implementation in LightDiffusion-Next: `src/cond/cond.py::calc_cond_batch()` groups compatible condition chunks by shape and memory budget, concatenates them, and falls back per chunk when transformer options mismatch. `src/sample/CFG.py` sets `uncond_ = None` when `cond_scale == 1.0` and the optimization is not disabled.
+- Project integration: this path is central to the standard sampling flow. The batching logic also validates Flux-style transformer image sizes and falls back when they do not match token grids.
+- Effect: fewer model invocations, better GPU utilization, and a lower-cost path for CFG=1 workloads.
+- Benefits: real throughput win, memory-aware, includes safety fallback for positional/shape mismatches.
+- Trade-offs: batching heuristics are shape- and memory-sensitive; fallback behavior can reduce speed when conditions diverge.
+- Evidence: `src/cond/cond.py`, `src/sample/CFG.py`, `src/sample/BaseSampler.py`, `tests/unit/test_calc_cond_batch_fallback.py`.
+## Sampling And Guidance Optimizations
+### AYS scheduler
+- Status: `default`
+- Purpose: use precomputed sigma schedules that spend steps where they matter most, so fewer steps can reach comparable quality.
+- Implementation in LightDiffusion-Next: schedules are encoded in `src/sample/ays_scheduler.py`; `src/sample/ksampler_util.py` routes `ays`, `ays_sd15`, and `ays_sdxl` to the scheduler and auto-detects model type when possible.
+- Project integration: both `server.py` and `src/user/pipeline.py` default the scheduler to `ays`. Exact schedules are used when present; otherwise the code resamples or interpolates schedules.
+- Effect: fewer denoising steps for similar output quality, especially on SD1.5 and SDXL.
+- Benefits: training-free, defaulted into the request path, compatible with the sampler stack.
+- Trade-offs: produces different trajectories than classic schedulers; unsupported step counts use interpolation rather than paper-derived schedules.
+- Evidence: `src/sample/ays_scheduler.py`, `src/sample/ksampler_util.py`, defaults in `server.py` and `src/user/pipeline.py`, benchmark usage in `tests/benchmark_performance.py`.
+### CFG++ samplers
+- Status: `integrated`
+- Purpose: apply CFG++-style momentum behavior in sampler variants to improve denoising stability and quality.
+- Implementation in LightDiffusion-Next: sampler registry maps `_cfgpp` sampler names to the same sampler classes, and `get_sampler()` enables `use_momentum` whenever the sampler name contains `_cfgpp`.
+- Project integration: the sampler loop stores prior denoised state and applies momentum-style correction through `BaseSampler.apply_cfg()`. The server default sampler is `dpmpp_sde_cfgpp`.
+- Effect: better denoising behavior than plain sampler variants without a separate post-process stage.
+- Benefits: integrated directly into the sampler registry; default sampler already uses it.
+- Trade-offs: only applies on `_cfgpp` variants; behavior is coupled to sampler implementation details rather than being a universal guidance layer.
+- Evidence: `src/sample/BaseSampler.py`, default sampler in `server.py`.
+### CFG-Free sampling
+- Status: `integrated, optional`
+- Purpose: reduce CFG late in the denoising process so the model can finish with less over-guidance.
+- Implementation in LightDiffusion-Next: `CFGGuider` stores `cfg_free_enabled` and `cfg_free_start_percent`, tracks current sigma position, and progressively reduces `self.cfg` once the configured progress threshold is crossed.
+- Project integration: the flag is part of the request/context surface and is forwarded by SD1.5, SDXL, Flux2, HiResFix, and Img2Img code paths.
+- Effect: potentially better detail recovery and more natural late-stage refinement.
+- Benefits: integrated and actually wired through multiple pipelines; easy to combine with the rest of the sampler stack.
+- Trade-offs: quality optimization rather than pure speedup; exact effect is prompt- and sampler-dependent.
+- Evidence: `src/sample/CFG.py`, `src/Core/Models/SD15Model.py`, `src/Core/Models/SDXLModel.py`, `src/Core/Models/Flux2KleinModel.py`, `src/Processors/HiresFix.py`, `src/Processors/Img2Img.py`.
+### Dynamic CFG rescaling
+- Status: `integrated, optional`
+- Purpose: reduce effective CFG when the guidance delta becomes too strong.
+- Implementation in LightDiffusion-Next: `CFGGuider._apply_dynamic_cfg_rescaling()` computes either a variance-based or range-based adjustment and clamps the result.
+- Project integration: it runs inside `cfg_function()` before CFG mixing is finalized, so it affects the real denoising path rather than acting as a post-hoc metric.
+- Effect: reduces oversaturation and over-guided outputs for high-CFG workloads.
+- Benefits: low incremental overhead and direct integration into CFG computation.
+- Trade-offs: not a pure speed optimization; the chosen formulas are heuristic and can flatten outputs if pushed too hard.
+- Evidence: `src/sample/CFG.py`.
+### Adaptive noise scheduling
+- Status: `integrated, optional`
+- Purpose: use observed prediction complexity to perturb the sigma schedule during sampling.
+- Implementation in LightDiffusion-Next: `CFGGuider` records complexity history during prediction and scales `sigmas` inside `inner_sample()` if adaptive mode is enabled.
+- Project integration: complexity can be estimated with a spatial-difference metric or variance-like behavior, depending on the selected method.
+- Effect: attempts to spend effort where the current prediction appears more complex.
+- Benefits: implemented end-to-end in the guider.
+- Trade-offs: heuristic, can alter reproducibility, and its benefit is much less established in this repo than AYS or request coalescing.
+- Evidence: `src/sample/CFG.py`.
+### `batched_cfg` request surface
+- Status: `implementation-specific`
+- Purpose: expose control over conditional/unconditional batching.
+- Implementation in LightDiffusion-Next: the field exists in the request and context models and is passed into sampling, where it is stored in `model_options["batched_cfg"]`.
+- Project integration: the main batching behavior is centered in `calc_cond_batch()`, while `batched_cfg` is carried through `model_options` as part of the request-side control surface around that path.
+- Effect: provides a request-facing handle for a batching path whose heavy lifting is performed centrally in conditioning packing.
+- Benefits: fits cleanly into the existing request and sampling pipeline.
+- Trade-offs: its effect is indirect because the main concatenation behavior is implemented deeper in the conditioning layer.
+- Evidence: `src/sample/sampling.py`, `src/Core/Context.py`, `src/cond/cond.py`.
+## Multiscale And Architecture-Specific Optimizations
+### Multi-scale latent switching
+- Status: `integrated, optional`
+- Purpose: run some denoising steps at a downscaled latent resolution and return to full resolution for selected steps.
+- Implementation in LightDiffusion-Next: `MultiscaleManager` in `src/sample/BaseSampler.py` computes a per-step full-resolution schedule and uses bilinear downscale/upscale around sampler model calls.
+- Project integration: the samplers consult `ms.use_fullres(i)` each step. Flux and Flux2 are explicitly excluded because the code treats multiscale as incompatible with DiT-style architectures.
+- Effect: lower compute on some denoising steps for compatible samplers and architectures.
+- Benefits: actually participates in the sampler loop; configurable by factor and schedule.
+- Trade-offs: it necessarily changes the denoising path and can trade detail for speed; not available for Flux/Flux2.
+- Evidence: `src/sample/BaseSampler.py`, `src/sample/sampling.py`, `src/Core/Models/Flux2KleinModel.py`.
+### HiDiffusion MSW-MSA patching
+- Status: `integrated, optional`
+- Purpose: patch UNet attention for high-resolution workflows using HiDiffusion-style MSW-MSA attention changes.
+- Implementation in LightDiffusion-Next: the pipeline clones the inner model and applies `ApplyMSWMSAAttentionSimple` when multiscale is enabled on UNet architectures.
+- Project integration: the patch is explicitly blocked for Flux/Flux2 and disabled in some sub-pipelines like refiner or certain detail passes where the project wants to avoid artifact risk.
+- Effect: makes the multiscale/high-resolution path more efficient or more stable on SD1.5/SDXL-style UNets.
+- Benefits: architecture-aware and guarded against obvious misuse.
+- Trade-offs: not universal; adds another patching layer and can be brittle if architecture assumptions drift.
+- Evidence: `src/Core/Pipeline.py`, `src/hidiffusion/msw_msa_attention.py`, `src/Core/AbstractModel.py`, `src/Core/Models/SD15Model.py`, `src/Core/Models/SDXLModel.py`.
+## Model Compilation, Precision, And Memory Optimizations
+### Stable-Fast
+- Status: `integrated, conditional`
+- Purpose: trace and wrap UNet execution to reduce Python overhead and optionally use CUDA graph behavior.
+- Implementation in LightDiffusion-Next: `src/StableFast/StableFast.py` builds a lazy trace module around the model function and stores compiled modules in a cache keyed by converted kwargs; `Pipeline._apply_optimizations()` applies it when `stable_fast` is enabled.
+- Project integration: only model types that advertise `supports_stable_fast=True` can use it. Flux2 explicitly opts out at the capability layer.
+- Effect: faster repeated UNet execution when the optional `sfast` dependency is present and shapes stay compatible enough for compilation reuse.
+- Benefits: capability-gated, optional dependency handled defensively, integrated into the core optimization application phase.
+- Trade-offs: dependency-sensitive, compilation overhead can dominate short runs, CUDA graph behavior is less flexible.
+- Evidence: `src/StableFast/StableFast.py`, `src/Core/Pipeline.py`, `src/Core/Models/SD15Model.py`, `src/Core/Models/SDXLModel.py`, `src/Core/Models/Flux2KleinModel.py`.
+### `torch.compile`
+- Status: `integrated, optional`
+- Purpose: rely on PyTorch compiler paths instead of Stable-Fast.
+- Implementation in LightDiffusion-Next: `src/Device/Device.py::compile_model()` defaults to `max-autotune-no-cudagraphs`; `src/Core/AbstractModel.py::apply_torch_compile()` applies it to the top-level module or diffusion submodule when possible.
+- Project integration: the optimization is mutually exclusive with Stable-Fast in the main pipeline.
+- Effect: compiler-based speedups with a safer default mode than more fragile CUDA-graph-heavy settings.
+- Benefits: built on standard PyTorch, tested for safe default mode.
+- Trade-offs: compiler behavior is environment-dependent; still vulnerable to dynamic-shape and dynamic-state limitations.
+- Evidence: `src/Device/Device.py`, `src/Core/AbstractModel.py`, `src/Core/Pipeline.py`, `tests/unit/test_fp8_compile.py`.
+### VAE compile, tiled path, and transfer tuning
+- Status: `integrated`
+- Purpose: speed up VAE encode/decode, reduce overhead, and avoid OOM by choosing tiled or batched paths.
+- Implementation in LightDiffusion-Next: `VariationalAE.VAE` compiles the decoder on first use, runs decode/encode under `torch.inference_mode()`, uses channels-last where useful, chooses tiled fallback when memory is tight, and uses non-blocking transfers.
+- Project integration: this is automatic. Callers do not opt in.
+- Effect: faster VAE stages, less repeated Python/autograd overhead, and better robustness under constrained memory.
+- Benefits: always enabled and directly applied in the decode and encode hot path.
+- Trade-offs: decoder compile still depends on `torch.compile` availability; tiling adds complexity and can affect throughput at small sizes.
+- Evidence: `src/AutoEncoders/VariationalAE.py`.
+### BF16/FP16 automatic dtype selection
+- Status: `integrated, conditional`
+- Purpose: pick a lower-precision working dtype that matches the hardware and model constraints.
+- Implementation in LightDiffusion-Next: `src/Device/Device.py` contains the dtype selection logic for UNet, text encoder, and VAE devices/dtypes, including bf16 support checks and fallback rules.
+- Project integration: loaders and patchers consult these helpers when deciding how to instantiate and place components.
+- Effect: reduced memory footprint and better arithmetic throughput on modern hardware.
+- Benefits: broad, centralized policy.
+- Trade-offs: heuristic; wrong hardware assumptions can reduce numerical stability or disable a faster path.
+- Evidence: `src/Device/Device.py`, `src/Model/ModelPatcher.py`, `src/FileManaging/Loader.py`.
+### FP8 weight quantization
+- Status: `integrated, conditional`
+- Purpose: store weights in FP8 while casting them back to the input dtype during execution.
+- Implementation in LightDiffusion-Next: `AbstractModel.apply_fp8()` hardware-gates support using `Device.is_fp8_supported()`, rewrites eligible weights to FP8, and enables runtime cast behavior on `CastWeightBiasOp` modules. The lower-level `ModelPatcher.weight_only_quantize()` also supports FP8-style quantization.
+- Project integration: it is available through generation settings and also used in Flux2 load paths when appropriate.
+- Effect: lower model weight memory with an execution path that avoids dtype-mismatch crashes.
+- Benefits: tested explicitly, integrates with cast-aware modules, useful for large models.
+- Trade-offs: hardware-gated; quality/performance trade-offs depend on model and layer mix.
+- Evidence: `src/Core/AbstractModel.py`, `src/Device/Device.py`, `src/Model/ModelPatcher.py`, `tests/unit/test_fp8_compile.py`.
+### NVFP4 weight quantization
+- Status: `integrated, optional`
+- Purpose: use a more aggressive 4-bit weight-only format to reduce memory further than FP8.
+- Implementation in LightDiffusion-Next: both `AbstractModel.apply_nvfp4()` and `ModelPatcher.weight_only_quantize("nvfp4")` quantize supported weights, store scale buffers, and enable runtime casting/dequantization.
+- Project integration: the quantization path is used most clearly in Flux2/Klein loading, but the abstract model path also exists for supported models.
+- Effect: significant memory reduction at the cost of more aggressive approximation.
+- Benefits: strongest memory reduction path in the repo.
+- Trade-offs: more invasive than FP8, more likely to affect quality, and only applies to some weight shapes.
+- Evidence: `src/Core/AbstractModel.py`, `src/Model/ModelPatcher.py`, `src/Utilities/Quantization.py`, `tests/test_nvfp4.py`, `tests/test_nvfp4_integration.py`.
+### Flux2 load-time weight-only quantization
+- Status: `integrated, conditional`
+- Purpose: automatically quantize large Flux2 diffusion and Klein text encoder weights during loading when the configuration or hardware path calls for it.
+- Implementation in LightDiffusion-Next: `Flux2KleinModel.load()` selects a quantization format and applies weight-only quantization to the diffusion model; `_load_klein_text_encoder()` applies the same idea to the text encoder before offloading it back to CPU.
+- Project integration: Flux2 is the clearest example in the codebase where quantization is implemented as a first-class loading strategy rather than as a generic capability alone.
+- Effect: keeps a large Flux2/Klein stack usable on lower-VRAM systems than an uncompressed load would allow.
+- Benefits: integrated, architecture-specific, and directly aligned with large-model VRAM constraints.
+- Trade-offs: tightly coupled to Flux2/Klein assumptions; not equivalent to a universally available quantized-mode toggle.
+- Evidence: `src/Core/Models/Flux2KleinModel.py`.
+### ToMe
+- Status: `integrated, optional`
+- Purpose: merge similar tokens to reduce attention workload in UNet-based models.
+- Implementation in LightDiffusion-Next: `ModelPatcher.apply_tome()` applies and removes `tomesd` patches; `Pipeline._apply_optimizations()` applies it only when the model capabilities allow it.
+- Project integration: SD1.5 and SDXL advertise `supports_tome=True`; Flux2 advertises `False`.
+- Effect: lower attention cost on supported UNet models, particularly at higher token counts.
+- Benefits: explicitly capability-gated, integrated into the core optimization phase.
+- Trade-offs: optional dependency, UNet-only in current practice, and quality can soften if pushed too aggressively.
+- Evidence: `src/Model/ModelPatcher.py`, `src/Core/Pipeline.py`, capability declarations in `src/Core/Models/*`, `tests/unit/test_tome_fix.py`.
+### DeepCache
+- Status: `integrated, optional, implementation-specific`
+- Purpose: reuse work across denoising steps rather than running a full forward pass every time.
+- Implementation in LightDiffusion-Next: `ApplyDeepCacheOnModel.patch()` clones the model and wraps its UNet function. On cache-update steps it runs the model normally and stores the output; on reuse steps it returns the cached output directly.
+- Project integration: the main pipeline applies it from `_apply_optimizations()` when `deepcache_enabled` is true and the model advertises support.
+- Effect: fewer full model computations on reuse steps, trading some fidelity for speed.
+- Benefits: live integrated path, simple integration model, and capability gating.
+- Trade-offs: the implementation works at whole-output reuse granularity rather than a finer-grained internal block reuse strategy, so its speed/fidelity profile is comparatively coarse.
+- Evidence: `src/WaveSpeed/deepcache_nodes.py`, `src/Core/Pipeline.py`, `src/Core/AbstractModel.py`, `src/Core/Models/SD15Model.py`, `src/Core/Models/SDXLModel.py`, `tests/test_core_functionalities.py`.
+### First Block Cache for Flux
+- Status: `codebase groundwork`
+- Purpose: cache downstream transformer work when the first-block residual indicates the state has not changed much.
+- Implementation in LightDiffusion-Next: `src/WaveSpeed/first_block_cache.py` contains cache contexts and patch builders for both UNet-like and Flux-like forward paths.
+- Project integration: the module provides the machinery for a Flux-oriented first-block caching path. In the current project flow, the directly surfaced caching path is DeepCache, while this module remains groundwork for a more specialized integration.
+- Effect: establishes the components needed for a transformer-oriented cache path in the codebase.
+- Benefits: nontrivial implementation foundation already exists.
+- Trade-offs: it is not yet surfaced as a broad standard option in the same way as the main integrated optimizations.
+- Evidence: `src/WaveSpeed/first_block_cache.py`.
+## Memory Management And Serving Optimizations
+### Low-VRAM partial loading and offload policy
+- Status: `integrated`
+- Purpose: keep only the amount of model state in VRAM that current free memory allows, offloading the rest.
+- Implementation in LightDiffusion-Next: `cond_util.prepare_sampling()` calls `Device.load_models_gpu(..., force_full_load=False)`; `Device.load_models_gpu()` computes low-VRAM budgets and delegates partial loading to `ModelPatcher.patch_model_lowvram()` and `partially_load()`.
+- Project integration: this is a core loading behavior, not a side option. Text encoder and VAE also have explicit offload-device helpers.
+- Effect: keeps generation viable on limited VRAM systems and reduces full reload pressure.
+- Benefits: central to memory behavior in constrained environments, architecture-aware, and tied into checkpoint, text encoder, and VAE device policy.
+- Trade-offs: more complex state management; partial loading can increase latency and complicate debugging.
+- Evidence: `src/cond/cond_util.py`, `src/Device/Device.py`, `src/Model/ModelPatcher.py`.
+### Async transfer helpers and pinned checkpoint tensors
+- Status: `integrated, infrastructure-level`
+- Purpose: reduce CPU<->GPU transfer cost with asynchronous copies, streams, and pinned host memory.
+- Implementation in LightDiffusion-Next: `Device.cast_to()` can issue transfers on offload streams; checkpoint tensors are pinned on CUDA loads in `util.load_torch_file()`; VAE encode/decode uses non-blocking transfers.
+- Project integration: these mechanisms appear most clearly in checkpoint loading, model movement, and VAE data flow. Some parts act as general transfer infrastructure rather than as a single user-facing optimization toggle.
+- Effect: faster host/device movement and less transfer-induced stalling in hot paths that actually use the helpers.
+- Benefits: useful on CUDA systems, especially during model load and VAE stages.
+- Trade-offs: integration is uneven; some helper functions look broader than their current call footprint.
+- Evidence: `src/Device/Device.py`, `src/Utilities/util.py`, `src/AutoEncoders/VariationalAE.py`.
+### Request coalescing and queue batching
+- Status: `integrated`
+- Purpose: batch compatible API requests together so the backend does fewer larger pipeline invocations.
+- Implementation in LightDiffusion-Next: `server.py::GenerationBuffer` groups pending requests by a signature that includes model, size, scheduler, sampler, steps, multiscale settings, and other batch-level properties.
+- Project integration: the worker chooses the oldest eligible group, optionally waits for more arrivals, flattens per-request samples into one pipeline call, and later remaps saved results back to request futures.
+- Effect: better throughput and GPU utilization for concurrent API use.
+- Benefits: real server-level optimization, clearly implemented, includes observability-oriented logs.
+- Trade-offs: requires careful grouping keys; incompatible request options fragment batching opportunities.
+- Evidence: `server.py`.
+### Singleton policy, large-group chunking, and image-save guardrails
+- Status: `integrated`
+- Purpose: prevent batching from hurting latency for lone requests, and prevent oversized coalesced batches from exploding decode/save paths.
+- Implementation in LightDiffusion-Next: `LD_BATCH_WAIT_SINGLETONS` controls whether singletons wait; `LD_MAX_IMAGES_PER_GROUP` and `ImageSaver.MAX_IMAGES_PER_SAVE` drive chunking; large groups are split into smaller sequential pipeline runs.
+- Project integration: the server keeps the coalescing optimization from turning into pathological giant save/decode operations, and tests cover the chunking behavior.
+- Effect: better tail latency for single requests and more stable handling of large batched workloads.
+- Benefits: directly addresses operational failure modes in large batched workloads.
+- Trade-offs: chunking reduces some batching benefits; many environment variables affect behavior.
+- Evidence: `server.py`, `src/FileManaging/ImageSaver.py`, `tests/unit/test_generation_buffer_chunking.py`, `docs/quirks.md`.
+### Next-model prefetch
+- Status: `integrated`
+- Purpose: while one batch is running, read the next checkpoint into CPU RAM if the queued next batch needs a different model.
+- Implementation in LightDiffusion-Next: `GenerationBuffer._look_ahead_and_prefetch()` resolves the next checkpoint, loads it via `util.load_torch_file()` on a background task, and stores it in `ModelCache` as a prefetched state dict.
+- Project integration: the next load can reuse the prefetched state dict through `util.load_torch_file()` before the cache entry is cleared.
+- Effect: overlaps some future checkpoint load cost with current generation work.
+- Benefits: server-side latency hiding with minimal interface impact.
+- Trade-offs: only helps when queued work is predictable; increases CPU RAM usage.
+- Evidence: `server.py`, `src/Device/ModelCache.py`, `src/Utilities/util.py`.
+### Keep-models-loaded cache
+- Status: `integrated`
+- Purpose: keep recently used checkpoints and sampling models resident instead of cleaning them up after every request.
+- Implementation in LightDiffusion-Next: `ModelCache` stores checkpoints, TAESD models, sampling models, and the keep-loaded policy; `server.py` temporarily applies the request's `keep_models_loaded` directive for a group.
+- Project integration: when enabled, main models are retained and only auxiliary control models are cleaned up aggressively.
+- Effect: lower warm-start cost between related generations and less repetitive reload churn.
+- Benefits: simple end-user behavior for a meaningful latency/memory trade-off.
+- Trade-offs: consumes more VRAM/RAM; can make memory pressure less predictable on multi-user servers.
+- Evidence: `src/Device/ModelCache.py`, `server.py`.
+### In-memory PNG byte buffer
+- Status: `integrated`
+- Purpose: return API images from memory instead of reading them back from disk after save.
+- Implementation in LightDiffusion-Next: `ImageSaver` can store encoded PNG bytes in `_image_bytes_buffer`; `server.py` first calls `pop_image_bytes()` when fulfilling request futures.
+- Project integration: batched pipeline runs can still save images normally while the API path avoids a disk round-trip for the response payload.
+- Effect: lower response latency and less unnecessary disk I/O for served images.
+- Benefits: directly reduces response-path disk I/O in API-serving scenarios.
+- Trade-offs: consumes temporary RAM; only helps when the buffer path is actually populated.
+- Evidence: `src/FileManaging/ImageSaver.py`, `server.py`.
+### TAESD preview pacing and preview fidelity control
+- Status: `integrated, conditional`
+- Purpose: keep live previews useful without letting preview generation dominate sampling time.
+- Implementation in LightDiffusion-Next: `SamplerCallback` caches preview settings, only triggers previews at a coarse interval, and runs preview work on a background thread; the server also applies per-request preview fidelity presets (`low`, `balanced`, `high`).
+- Project integration: previews are generated only when previewing is enabled, and the preview cadence is adaptive to total step count.
+- Effect: live feedback with bounded preview overhead.
+- Benefits: explicit pacing, non-blocking thread model, request-level fidelity override.
+- Trade-offs: still extra work during sampling; fidelity presets are intentionally coarse.
+- Evidence: `src/sample/BaseSampler.py`, `src/AutoEncoders/taesd.py`, `server.py`, preview tests under `tests/e2e` and `tests/integration/api`.
+## Integration Notes
+These notes highlight how several optimizations are currently integrated and used inside the project.
+### 1. Flux-oriented first block caching
+- The codebase contains a dedicated `src/WaveSpeed/first_block_cache.py` module with cache contexts and patch builders for Flux-oriented paths.
+- In the current optimization stack, the directly surfaced caching path is DeepCache, while First Block Cache remains implementation groundwork for a more specialized integration.
+- This establishes the core components for a transformer-oriented cache path even though it is not yet surfaced as a primary standard option.
+### 2. DeepCache reuse granularity
+- DeepCache is integrated through `src/WaveSpeed/deepcache_nodes.py` and is applied from the main pipeline when enabled.
+- In this project, it works by reusing prior denoiser outputs on designated reuse steps.
+- This yields a clear speed-fidelity profile based on output reuse rather than on finer-grained internal block caching.
+### 3. Conditioning batching control
+- Conditioning batching is centered in `src/cond/cond.py::calc_cond_batch()`, where compatible condition chunks are packed and concatenated.
+- The `batched_cfg` request field participates as request-side control metadata around this behavior.
+- In operation, the batching outcome is therefore shaped mainly by the central conditioning logic rather than by a standalone external switch.
+### 4. GPU attention backend selection
+- Attention backend selection is hardware- and build-aware, with the runtime choosing among SpargeAttn, SageAttention, xformers, and PyTorch SDPA based on capability checks.
+- The exact backend used in practice therefore depends on the active GPU generation, dependencies, and runtime configuration.
+- Backend acceleration is therefore largely automatic from the user perspective while remaining environment-specific in implementation.
+### 5. Prompt cache behavior
+- Prompt caching is implemented as a global dict-backed cache keyed by prompt hash and CLIP identity.
+- The cache prunes old entries once it exceeds its configured size threshold.
+- In operation, it primarily benefits repeated-prompt workflows such as seed sweeps and prompt iteration.
+## Conclusion
+LightDiffusion-Next uses a layered optimization strategy spanning runtime kernels, scheduling, guidance logic, precision and memory control, model patching, and server-side throughput management.
+- The core operational stack is built around AYS scheduling, attention backend selection, conditioning batching, low-VRAM loading policy, prompt caching, VAE tuning, and request coalescing.
+- Optional paths such as Stable-Fast, `torch.compile`, ToMe, DeepCache, multiscale sampling, and quantization extend that stack for specific hardware targets, model families, and workload profiles.
+- The serving layer is a first-class component of the performance model, with batching, chunking, prefetching, keep-loaded caches, and in-memory responses contributing directly to end-to-end latency and throughput.

docs/index.md ADDED Viewed

	@@ -0,0 +1,44 @@

+# LightDiffusion-Next
+LightDiffusion-Next is a refactored and performance-first Stable Diffusion stack that bundles a modern Streamlit UI, an optional Gradio web app, a batched FastAPI backend and highly tuned inference primitives such as Stable-Fast, SageAttention and WaveSpeed caching.
+## Why pick LightDiffusion-Next
+LightDiffusion-Next is built to handle day-to-day generation workloads on consumer GPUs while still scaling up to multi-user servers.
+- **Fast by default.** Stable-Fast compilation, SageAttention, SpargeAttn and WaveSpeed caching are wired in so you can hit top-tier it/s without manual patching.
+- **Multiple front-doors.** Choose between the Streamlit control room, a Gradio web UI (great for Spaces) or the programmable FastAPI queue for integrations.
+- **Feature complete.** Txt2Img, Img2Img, Flux pipelines, AutoHDR, TAESD previews, prompt enhancement through Ollama, multi-scale diffusion with presets, LoRA mixing and automatic detailing are all available out of the box.
+- **Operations friendly.** Docker images, GPU-aware batched serving, model caching controls and observability endpoints make it easy to deploy and monitor.
+## What ships in the box
+- 🚀 **Streamlined UI** with live previews, history, presets, interrupt/resume controls and automatic metadata tagging.
+- 🧠 **Prompt toolkit** including reusable negative embeddings, multi-concept weighting, prompt enhancement and prompt history.
+- 🧩 **Modular pipeline** that routes SD1.5, SDXL-inspired workflows and quantized Flux models through a single code path with per-sample overrides for HiresFix, ADetailer or Img2Img.
+- 🛠️ **Production API** powered by FastAPI with smart request coalescing, telemetry endpoints and base64 image responses ready for bots or creative tooling.
+- 📦 **Deployment artifacts** such as Dockerfiles, docker-compose, run scripts for Windows, configurable GPU architecture flags and optional Ollama/Stable-Fast builds.
+## Quick pathways
+- [Installation](installation.md) — pick Docker, Windows batch or manual Python setup.
+- [First run & UI tour](usage.md) — learn the Streamlit layout, generation controls and history tools.
+- [Workflow playbook](examples.md) — step through Txt2Img, Flux, Img2Img and API recipes.
+- [Performance optimizations](optimizations.md) — understand SageAttention, Stable-Fast, WaveSpeed caching and the new AYS scheduler for 2-5x speedup.
+- [Align Your Steps](ays-scheduler.md) — learn about AYS scheduler and prompt caching for additional speedup.
+- [Prompt Caching](prompt-caching.md) — deep dive into prompt attention caching mechanics and tuning.
+- [Performance tuning](quirks.md) — squeeze out extra throughput or reduce VRAM usage.
+- [Architecture](architecture.md) — understand how the UI, pipeline and server cooperate.
+- [REST & automation](api.md) — integrate Discord bots, automations or other clients.
+## Supported environments at a glance
+- NVIDIA GPUs with CUDA 12.x drivers. SageAttention and SpargeAttn availability is detected at runtime and depends on installed kernels, drivers and GPU compute capability; some kernels may be disabled on newer CUDA runtimes (for example CUDA 12+). RTX 50xx and newer cards may use SageAttention + Stable-Fast where supported.
+- Windows 10/11, Ubuntu 22.04+ and containerized deployments via Docker with NVIDIA Container Toolkit.
+- Optional CPU-only mode for experimentation (no Stable-Fast/SageAttention speed-ups).
+## Where to head next
+- Start with [Installation](installation.md) to get your environment ready.
+- Drop into the [Streamlit UI guide](usage.md) for a tour of generation features and presets.
+- Explore [Architecture](architecture.md) when you are ready to customize or embed LightDiffusion-Next in larger systems.

docs/installation.md ADDED Viewed

	@@ -0,0 +1,161 @@

+# Installation & Setup
+LightDiffusion-Next can run locally on Windows or Linux, inside Docker, or on cloud GPUs. This page walks you through the supported installation paths and the assets you must download before your first generation.
+## Hardware & software requirements
+The project is tuned for NVIDIA GPUs and CUDA 12.x drivers, but it also supports AMD GPUs with ROCm and Apple Silicon with Metal Performance Shaders (MPS). See [ROCm and Metal/MPS Support](rocm-metal-support.md) for platform-specific installation instructions.
+- **Operating system:** Windows 10/11, Ubuntu 22.04+, macOS 12.3+ (for Apple Silicon), or any distro supported by NVIDIA Container Toolkit.
+- **Python:** 3.10.x. The run scripts create a virtual environment automatically.
+- **GPU:**
+  - **NVIDIA:** Card with at least compute capability 8.0 (Ampere) for SageAttention/SpargeAttn. RTX 50 series (compute 12.0) runs with SageAttention + Stable-Fast.
+  - **AMD:** RDNA 2+ or CDNA architectures with ROCm 5.0+. See [ROCm Support](rocm-metal-support.md#rocm-support-amd-gpus).
+  - **Apple Silicon:** M1/M2/M3 series with macOS 12.3+. See [Metal/MPS Support](rocm-metal-support.md#metalmps-support-apple-silicon).
+- **VRAM:** 6 GB minimum (12 GB recommended) for SD1.5 workflows. Flux quantized pipelines require 16 GB+ for comfortable batching.
+- **Disk space:** ~15 GB for dependencies plus your checkpoints, LoRAs and flux assets.
+## Choose an installation path
+- [Windows quick start](#windows-quick-start-runbat)
+- [Linux or WSL2 manual setup](#linuxwsl2-manual-setup)
+- [Containerized deployment](#docker-and-containers)
+- [Headless server API](#running-only-the-fastapi-server)
+### Windows quick start (`run.bat`)
+The root repository ships with a convenience script that handles environment creation, dependency installation via `uv`, GPU detection and launching the Streamlit UI.
+1. Install the latest [Python 3.10](https://www.python.org/downloads/release/python-3100/) build and ensure `python` is on your `PATH`.
+2. Install the [NVIDIA CUDA 12 runtime driver](https://developer.nvidia.com/cuda-downloads) that matches your GPU.
+3. Clone the repository and place your checkpoints in `include/checkpoints` (see [Model assets](#model-assets)).
+4. Double-click `run.bat` from a terminal. The script will:
+	- Create `.venv` (if it does not exist) and upgrade `pip`.
+	- Install `uv` for fast dependency resolution.
+	- Detect an NVIDIA GPU via `nvidia-smi` and install the matching PyTorch wheels.
+	- Install all requirements and start Streamlit at `http://localhost:8501`.
+5. When you are done, close the terminal to stop the UI. The virtual environment is reusable across runs.
+> **Tip:** To launch the Gradio UI instead, activate `.venv` and run `python app.py`.
+### Linux/WSL2 manual setup
+1. Install system dependencies:
+	```bash
+	sudo apt update && sudo apt install python3.10 python3.10-venv python3-pip build-essential git
+	```
+	> If you plan to use **AutoHDR** (ICC-based color transforms), ensure Little CMS (lcms2) is installed so Pillow can build profile transforms. On Debian/Ubuntu:
+	```bash
+	sudo apt-get install -y liblcms2-2 liblcms2-dev
+	pip install --upgrade --force-reinstall pillow
+	```
+2. (Optional) Install the [NVIDIA CUDA 12 toolkit](https://developer.nvidia.com/cuda-toolkit-archive) so SageAttention/SpargeAttn can compile native extensions.
+3. Create and activate a virtual environment:
+	```bash
+	python3 -m venv .venv
+	source .venv/bin/activate
+	pip install --upgrade pip uv
+	```
+4. Install PyTorch and core dependencies:
+	```bash
+	uv pip install --index-url https://download.pytorch.org/whl/cu128 torch torchvision "triton>=2.1.0"
+	uv pip install -r requirements.txt
+	```
+5. Launch the Streamlit UI:
+	```bash
+	streamlit run streamlit_app.py --server.address=0.0.0.0 --server.port=8501
+	```
+	Use `python app.py` if you prefer the Gradio interface.
+6. Deactivate the environment with `deactivate` when finished.
+### Docker and containers
+Use Docker when you want an immutable runtime with SageAttention, SpargeAttn and Stable-Fast prebuilt.
+1. Install [Docker Desktop](https://www.docker.com/products/docker-desktop/) or Docker Engine with the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
+2. Clone the repository and review `docker-compose.yml`. Adjust:
+	- `TORCH_CUDA_ARCH_LIST` if you only target a specific GPU architecture.
+	- `INSTALL_STABLE_FAST` and `INSTALL_OLLAMA` build arguments if you want Stable-Fast or the Ollama prompt enhancer baked into the image.
+	- Volume mounts for `output/` and the `include/*` directories where you store checkpoints, LoRAs, embeddings and YOLO detectors.
+3. Build and start the stack:
+	```bash
+	docker-compose up --build
+	```
+	Streamlit is exposed on `http://localhost:8501` by default; Gradio is mapped to port `7860` and can be enabled by setting `UI_FRAMEWORK=gradio`.
+4. To rebuild with a different GPU architecture or optional component:
+	```bash
+	docker-compose build --build-arg TORCH_CUDA_ARCH_LIST="9.0" --build-arg INSTALL_STABLE_FAST=1
+	```
+### Running only the FastAPI server
+If you want to integrate LightDiffusion-Next into automation pipelines or Discord bots, run the backend without launching a UI.
+1. Follow any of the setup methods above.
+2. Run:
+	```bash
+	uvicorn server:app --host 0.0.0.0 --port 7861
+	```
+3. Use the [REST API reference](api.md) to submit generation jobs via `POST /api/generate` and inspect queue health via `GET /api/telemetry`.
+## Model assets
+LightDiffusion-Next does not bundle model weights. Place your assets into the `include/` tree before you start generating.
+- `include/checkpoints/` — SD1.5 style `.safetensors` checkpoints (e.g. Meina V10, DreamShaper). The default pipeline expects a file named `Meina V10 - baked VAE.safetensors` unless you override it.
+- `include/vae/ae.safetensors` — Flux VAE (download from [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell)). Required for Flux mode.
+- `include/loras/` — LoRA adapters loaded from the UI or CLI.
+- `include/embeddings/` — Negative prompt embeddings such as `EasyNegative`, `badhandv4`.
+- `include/yolos/` — YOLO detectors used by ADetailer (`person_yolov8m-seg.pt`, `face_yolov9c.pt`).
+- `include/ESRGAN/` — RealESRGAN models leveraged by UltimateSDUpscale in Img2Img workflows.
+- `include/sd1_tokenizer/` — Tokenizer files for SD1.x. The repository already includes the defaults.
+Store generated outputs under `output/` (separated into Classic, Flux, Img2Img, HiresFix and ADetailer sub-folders). The folders are created automatically during the first run.
+## Optional accelerations
+- **Stable-Fast** — 70% faster SD1.5 inference through UNet compilation. Set `INSTALL_STABLE_FAST=1` in Docker or pass `--stable-fast` in the CLI/UI to compile on demand. Compilation adds a one-time warm-up cost.
+- **SageAttention** — INT8 attention kernels with 15% speedup and lower VRAM use. Built automatically in Docker images; on bare metal, clone [SageAttention](https://github.com/thu-ml/SageAttention) and run `pip install -e . --no-build-isolation` inside your environment.
+- **SpargeAttn** — Sparse attention kernels with 40–60% speedup (compute 8.0–9.0 GPUs only). Build from [SpargeAttn](https://github.com/thu-ml/SpargeAttn) using `TORCH_CUDA_ARCH_LIST="8.9"` or similar.
+- **Ollama prompt enhancer** — Install [Ollama](https://ollama.com/) and pull `qwen3:0.6b`. Set `PROMPT_ENHANCER_MODEL=qwen3:0.6b` before launching LightDiffusion-Next to enable the automatic prompt rewrite toggle.
+## Verify your installation
+1. Start the UI or FastAPI server.
+2. Watch the startup logs — the initialization progress bar runs the dependency download routine (`CheckAndDownload`) and loads the default checkpoint.
+3. Generate a 512×512 image with the default prompt. The status bar shows timing and the output appears in `output/Classic`.
+4. Confirm the telemetry endpoint is reachable:
+	```bash
+	curl http://localhost:7861/health
+	curl http://localhost:7861/api/telemetry
+	```
+## Updating or rebuilding
+- Pull the latest Git changes and rerun `uv pip install -r requirements.txt` in the virtual environment.
+- For Docker users, rebuild with `docker-compose build --no-cache` to pick up updates.
+- If you upgraded your GPU driver or CUDA toolkit, delete `~/.cache/torch_extensions` to force SageAttention/SpargeAttn to recompile.
+You are now ready to explore the [UI guide](usage.md) and start generating.

docs/optimizations.md ADDED Viewed

	@@ -0,0 +1,262 @@

+# Performance Optimizations
+LightDiffusion-Next achieves its industry-leading inference speed through a layered stack of training-free optimizations that can be selectively enabled based on your hardware and quality requirements. This page provides an overview of each acceleration technique and links to detailed guides.
+For a detailed source-based report on what is implemented today, including server-side throughput optimizations and practical implementation notes, see the [Implemented Optimizations Report](implemented-optimizations-report.md).
+## Optimization Stack Overview
+The pipeline orchestrates six primary acceleration paths:
+| Technique | Type | Speedup | Quality Impact | Requirements |
+|-----------|------|---------|----------------|---------------|
+| [AYS Scheduler](#ays-scheduler) | Sampling schedule | ~2x | None/Better | All models |
+| [Prompt Caching](#prompt-caching) | Embedding cache | 5-15% | None | All models |
+| [SageAttention](#sageattention--spargeattn) | Attention kernel | Moderate | None | All CUDA GPUs |
+| [SpargeAttn](#sageattention--spargeattn) | Sparse attention | Significant | Minimal | Compute 8.0-9.0 |
+| [Stable-Fast](#stable-fast) | Graph compilation | Significant* | None | >8GB VRAM, batch jobs |
+| [WaveSpeed](#wavespeed-caching) | Feature caching | High | Tunable | All models |
+*Speedup depends heavily on batch size and generation count
+These optimizations **work together** — enabling multiple techniques simultaneously can provide substantial cumulative speedup with tunable quality trade-offs.
+## Quick Comparison
+### AYS Scheduler
+**What it does:** Uses research-backed optimal timestep distributions that allow equivalent quality in approximately half the steps. Instead of uniform sigma spacing, AYS concentrates samples on noise levels that contribute most to image formation.
+**When to use:**
+- Always recommended for SD1.5, SDXL, and Flux models
+- Txt2Img generation
+- Production workflows where speed matters
+- Any scenario where you'd normally use 20+ steps
+**Trade-offs:** Images will differ slightly from standard schedulers (different sampling path), but quality is equivalent or better. Not ideal when exact reproduction of old results is required.
+[→ Full AYS Scheduler guide](ays-scheduler.md)
+---
+### Prompt Caching
+**What it does:** Caches CLIP text embeddings for prompts that have been encoded before. When generating multiple images with the same or similar prompts, embeddings are retrieved from cache instead of being recomputed.
+**When to use:**
+- Batch generation with same prompt
+- Testing different seeds or settings
+- Iterative prompt refinement
+- Any workflow with repeated prompts
+**Trade-offs:** None — minimal memory overhead (~50-200MB), negligible CPU cost, automatically enabled by default.
+[→ Full Prompt Caching guide](prompt-caching.md)
+---
+### SageAttention & SpargeAttn {#sageattention--spargeattn}
+**What it does:** Replaces PyTorch's default scaled dot-product attention with highly optimized CUDA kernels. SageAttention uses INT8 quantization for key/value tensors while maintaining FP16 query precision. SpargeAttn extends this with dynamic sparsity pruning, skipping redundant attention computations.
+**When to use:**
+- Always enable SageAttention if available (no quality loss, pure speed gain)
+- SpargeAttn for maximum speed on supported hardware (RTX 30xx/40xx, A100, H100)
+- Both work seamlessly with all samplers, LoRAs and post-processing stages
+**Trade-offs:** None for SageAttention. SpargeAttn may introduce subtle texture variations at very high sparsity thresholds (default is conservative).
+[→ Full SageAttention/SpargeAttn guide](sageattention.md)
+---
+### CFG Samplers {#cfg-samplers}
+CFG++ Samplers are advanced sampling algorithms that incorporate Classifier-Free Guidance directly into the sampling process, providing better quality and stability compared to standard CFG.
+---
+### Multi-Scale Diffusion {#multi-scale}
+Multi-Scale Diffusion optimizes performance by processing images at multiple resolutions during generation, reducing computation for high-resolution areas.
+**When to use:**
+- High-resolution generation (>1024px)
+- When memory is limited
+- For faster previews
+**Trade-offs:** May reduce detail in fine areas.
+**Note:** In most cases, Multi-Scale Diffusion in quality mode gives better results than standard diffusion while giving a small speedup (this is explained by the upsampling process).
+---
+### Stable-Fast
+**What it does:** JIT-compiles the UNet diffusion model into optimized TorchScript with optional CUDA graphs. The first forward pass traces execution, caches kernel launches and fuses operators for reduced overhead.
+**When to use:**
+- **Systems with >8GB VRAM** (preferably 12GB+)
+- Batch jobs or workflows generating 50+ images with identical settings
+- Long-running operations where 30-60s compilation amortizes over time
+- Fixed resolutions and batch sizes
+**When NOT to use:**
+- Normal 20-step single image generation (compilation overhead > speedup gains)
+- Systems with <8GB VRAM
+- Flux workflows (different architecture)
+- Quick prototyping or frequent model/resolution changes
+**Trade-offs:** Compilation time on first run (30-60s), VRAM overhead (~500MB), reduced flexibility for dynamic shapes.
+[→ Full Stable-Fast guide](stablefast.md)
+---
+### WaveSpeed Caching
+**What it does:** Exploits temporal redundancy in diffusion processes by reusing work across denoising steps. In the current project stack this primarily means DeepCache on supported UNet models, with additional Flux-oriented cache groundwork present in the codebase.
+1. **DeepCache** — Reuses prior denoiser outputs on selected steps in UNet models (SD1.5, SDXL)
+2. **First Block Cache (FBCache)** — Flux-oriented cache machinery available for specialized integration work
+**When to use:**
+- Any workflow where you can tolerate slight smoothing in exchange for 2-3x speedup
+- Combine with conservative cache intervals (2-3) for minimal quality loss
+- Works alongside SageAttention and Stable-Fast
+**Trade-offs:** Reduced fine detail if interval is too high, slight VRAM increase for cached tensors.
+[→ Full WaveSpeed guide](wavespeed.md)
+---
+## Priority & Fallback System
+LightDiffusion-Next automatically selects the best available attention backend at runtime:
+```
+SpargeAttn > SageAttention > xformers > PyTorch SDPA
+```
+If a kernel fails (e.g., unsupported head dimension), the system gracefully falls back to the next option. You can force PyTorch SDPA by setting `LD_DISABLE_SAGE_ATTENTION=1` for debugging.
+Stable-Fast and WaveSpeed are opt-in toggles controlled via the UI or REST API.
+## Recommended Configurations
+### Maximum Speed - Batch Jobs (SD1.5, >8GB VRAM, 50+ images)
+```yaml
+stable_fast: true  # Only for batch operations
+sageattention: auto  # or spargeattn if available
+deepCache:
+  enabled: true
+  interval: 3
+  depth: 2
+```
+**Expected:** Maximum speedup for batch operations, some quality loss
+**Note:** Disable stable_fast for single 20-step generations
+### Balanced - Quick Generation (SD1.5, any VRAM)
+```yaml
+scheduler: ays  # NEW: Use AYS for 2x speedup
+steps: 10  # Reduced from 20 (same quality with AYS)
+stable_fast: false  # Disabled for normal generations
+sageattention: auto
+prompt_cache_enabled: true  # Enabled by default
+deepcache:
+  enabled: true
+  interval: 2
+  depth: 1
+```
+**Expected:** ~2-3x speedup with minimal quality loss
+**Note:** AYS scheduler provides the main speedup; enable stable_fast only for batch jobs (50+ images)
+### Quality-First (Flux)
+```yaml
+scheduler: ays_flux  # NEW: Optimized for Flux models
+steps: 10  # Reduced from 15 (same quality with AYS)
+stable_fast: false  # not supported
+sageattention: auto
+prompt_cache_enabled: true
+deepcache:
+  enabled: true
+  interval: 2
+```
+**Expected:** ~2x speedup with minimal quality impact
+### Production API - High Volume (>8GB VRAM)
+```yaml
+stable_fast: true  # Only for sustained high-volume APIs
+sageattention: auto
+deepCache:
+  enabled: false  # avoid variability across batch sizes
+keep_models_loaded: true
+```
+**Expected:** Consistent latency for repeated identical requests
+**Note:** For low-volume or single-shot APIs, use `stable_fast: false`
+## Hardware-Specific Tips
+### RTX 30xx / 40xx (Ampere/Ada)
+- Enable SpargeAttn for best results
+- Stable-Fast only for batch jobs (disable for quick 20-step generations)
+- Stable-Fast + SpargeAttn + DeepCache stacks well for long operations
+- Watch VRAM — Stable-Fast graphs consume ~500MB
+### RTX 50xx (Blackwell)
+- SageAttention only (SpargeAttn support pending)
+- Stable-Fast works but recompiles for new CUDA arch
+- DeepCache is your best additional speedup
+### A100 / H100 (Datacenter)
+- SpargeAttn + Stable-Fast + aggressive WaveSpeed
+- Prefer larger batch sizes to amortize kernel overhead
+- Use CUDA graphs (`enable_cuda_graph=True` in Stable-Fast config)
+### Low VRAM (<8GB)
+- **Always disable Stable-Fast** (requires >8GB VRAM)
+- Use SageAttention (minimal overhead)
+- Enable DeepCache with conservative intervals
+- Set `vae_on_cpu=True` for HiRes workflows
+## Debugging & Profiling
+Check which optimizations are active:
+```bash
+# View startup logs
+cat logs/server.log | grep -i "using\|enabled"
+# Sample output:
+# Using SpargeAttn (Sparse + SageAttention) cross attention
+# Using SpargeAttn (Sparse + SageAttention) in VAE
+# Stable-Fast compilation enabled
+# DeepCache active: interval=3, depth=2
+```
+Monitor telemetry:
+```bash
+curl http://localhost:7861/api/telemetry | jq '.vram_usage_mb, .average_latency_ms'
+```
+Disable individual optimizations to isolate issues:
+```bash
+export LD_DISABLE_SAGE_ATTENTION=1      # Forces PyTorch SDPA
+export LD_DISABLE_STABLE_FAST=1         # Skips compilation
+export LD_DISABLE_WAVESPEED=1           # Disables all caching
+```
+## Further Reading
+- [AYS Scheduler Deep Dive](ays-scheduler.md) — Theory, implementation, quality tuning
+- [Prompt Caching Deep Dive](prompt-caching.md) — Implementation details, cache management, performance impact
+- [SageAttention & SpargeAttn Deep Dive](sageattention.md) — Installation, technical details, head dimension handling
+- [Stable-Fast Compilation Guide](stablefast.md) — Configuration, CUDA graphs, troubleshooting
+- [WaveSpeed Caching Strategies](wavespeed.md) — DeepCache vs FBCache, tuning parameters, compatibility matrix
+- [Performance Tuning](quirks.md) — VRAM management, slow first runs, recompilation fixes
+---
+Armed with this overview, dive into the technique-specific guides or experiment directly in the UI to find your optimal speed/quality balance.

docs/prompt-caching.md ADDED Viewed

	@@ -0,0 +1,64 @@

+# Prompt Attention Caching
+### What It Does
+Caches CLIP text embeddings for prompts you've already encoded. When you reuse a prompt (or parts of it), the embedding is retrieved from cache instead of being recomputed.
+### When It Helps Most
+- Batch generation with same prompt
+- Testing different seeds
+- Incremental prompt refinement
+- Generation sessions with repeated themes
+### Configuration
+**Enable/Disable** (default: enabled):
+```python
+from src.Utilities import prompt_cache
+# Enable (default)
+prompt_cache.enable_prompt_cache(True)
+# Disable
+prompt_cache.enable_prompt_cache(False)
+# Check status
+stats = prompt_cache.get_cache_stats()
+print(f"Hit rate: {stats['hit_rate']:.1%}")
+```
+**Cache Settings**:
+- Maximum entries: 256 prompts before pruning
+- Cache structure: global dict keyed by prompt hash and CLIP identity
+- Memory usage: workload-dependent, estimated from cached embedding tensors
+- Cache cleared on: restart, disable, or manual clear
+- Automatic pruning: removes the oldest 25% of entries when the cache exceeds its limit
+### Viewing Cache Stats
+```python
+from src.Utilities import prompt_cache
+# Print statistics
+prompt_cache.print_cache_stats()
+# Output:
+# ============================================================
+# Prompt Cache Statistics
+# ============================================================
+#   Status: Enabled
+#   Entries: 42
+#   Size: ~85.3 MB
+#   Requests: 150 (hits: 108, misses: 42)
+#   Hit Rate: 72.0%
+# ============================================================
+```
+### Best Practices
+1. **Leave it enabled** - negligible overhead, significant gains
+2. **Monitor hit rate** - should be >50% in typical workflows
+3. **Clear cache** when switching models or major prompt changes
+4. **Batch similar prompts** to maximize cache hits
+5. **Expect global behavior** because the cache is shared across repeated prompt encodes rather than being scoped to a single generation session

docs/quirks.md ADDED Viewed

	@@ -0,0 +1,60 @@

+# Quirks & Troubleshooting
+This playbook highlights the most common operational quirks you may encounter while running LightDiffusion-Next and the quickest ways to resolve them.
+## GPU memory headaches
+| Symptom | Likely cause | Quick fixes |
+| --- | --- | --- |
+| `CUDA out of memory` during base diffusion | Resolution or batch too high | Drop to 512×512 or smaller, decrease batch to 1, disable HiresFix or AutoDetailer, prefer Euler/Karras samplers in **CFG++** mode |
+| OOM triggered mid-way through HiRes | VRAM spikes when loading VAE/second UNet | Enable **Keep models loaded** (to avoid reloading) or run HiRes on CPU by toggling *VAE on CPU* in settings |
+| Flux runs crash immediately | Missing Flux decoder or running on <16 GB VRAM | Place Flux weights in `include/Flux`, disable Flux or use SD1.5 profile on smaller cards |
+Additional tips:
+- Enable **VRAM budget** in Streamlit to see live usage (requires `LD_SHOW_VRAM=1`).
+- In Docker, pass `--gpus all` and ensure `NVIDIA_VISIBLE_DEVICES` is not empty.
+- Clear `~/.cache/torch_extensions` if Stable-Fast kernels were compiled against an older driver and now fail to load.
+## Slow first runs or repeated recompilation
+- Stable-Fast and SageAttention compile custom kernels on first use. This can take several minutes. Once complete, the compiled artifacts live under `~/.cache/torch_extensions` (host) or `/root/.cache/torch_extensions` (Docker). Mount this directory as a volume for faster cold starts.
+- If Streamlit re-compiles every launch, ensure the container or user has write access to the cache directory and that the system clock is correct.
+- Set `LD_DISABLE_SAGE_ATTENTION=1` to isolate issues related specifically to SageAttention.
+## Downloader complaints about missing assets
+- The startup checks look for standard filenames (e.g., `yolov8n.pt`, `taesdxl_decoder.safetensors`). Verify these live under the correct subdirectories in `include/`.
+- For offline setups, drop the files manually and create empty `.ok` sentinels (e.g., `include/checkpoints/.downloads-ok`) to skip prompts.
+- Hugging Face rate limits manifest as HTTP 429. Provide a token via the prompt, set `HF_TOKEN` in the environment or download manually.
+## Streamlit UI quirks
+- **Preview stuck on “Waiting for GPU”** – Check FastAPI logs; the batching worker may be paused. Restart the Streamlit session or run `python server.py` to inspect queue telemetry.
+- **Settings reset on restart** – Ensure the process can write to `webui_settings.json`. Remove the file to revert to defaults if it becomes corrupted.
+- **History thumbnails missing** – Delete the entry under `ui/history/<timestamp>`; the next render will recreate previews.
+## Gradio or API automation issues
+- `/api/generate` returns 500 with “No images produced”: inspect server logs for `Pipeline import error` or missing models. Ensure `pipeline.py` is importable and the working directory is the repository root.
+- Jobs appear stuck: call `/api/telemetry` to inspect `pending_by_signature`. Mixed resolutions or toggles prevent batching; if running single job automation, set `LD_BATCH_WAIT_SINGLETONS=0` to avoid coalescing delays.
+- SaveImage aborts with "Attempting to save N images in a single call" (exceeds `MAX_IMAGES_PER_SAVE`): this usually indicates tiled intermediate outputs or a very large batched tensor. The server will chunk large coalesced groups into smaller runs of at most `LD_MAX_IMAGES_PER_GROUP` images (default: 256) to mitigate this. If you must allow larger single-call saves, set `LD_MAX_IMAGES_PER_SAVE` to a higher value in the server environment (e.g., `export LD_MAX_IMAGES_PER_SAVE=256`) but be mindful of disk usage. Alternatively, reduce `num_images` per job or lower `LD_MAX_BATCH_SIZE` to keep groups smaller.
+- Health checks: `/health` returns `{ "status": "ok" }`. If it fails, the FastAPI app likely crashed—restart and inspect `logs/server.log`.
+## Docker-specific notes
+- Always build with the provided `Dockerfile` to get SageAttention patches precompiled.
+- Forward model assets by mounting `./include` into the container (`-v $(pwd)/include:/app/include`).
+- On Windows + WSL2, ensure the WSL distro has the NVIDIA driver bridge (`wsl --status`).
+## Logging & diagnostics
+- Server logs live under `logs/server.log` with per-request IDs. Tail them during load testing: `tail -f logs/server.log`.
+- Enable debug logging by exporting `LD_SERVER_LOGLEVEL=DEBUG` before launching Streamlit/Gradio/uvicorn.
+- To inspect queue depth without hitting the API, watch the `GenerationBuffer` logs; each batch prints signature summaries.
+## When all else fails
+- Clear the `include/last_seed.txt` file if seed reuse behaves unexpectedly.
+- Regenerate Stable-Fast kernels by deleting the cache directory and re-running with `stable_fast` enabled.
+- Collect the following before opening an issue: GPU model, driver version, operating system, a copy of `logs/server.log`, hardware info from `/api/telemetry`, and reproduction steps.

docs/rocm-metal-support.md ADDED Viewed

	@@ -0,0 +1,360 @@

+# ROCm and Metal/MPS Support
+LightDiffusion-Next includes comprehensive support for AMD GPUs with ROCm and Apple Silicon Macs with Metal Performance Shaders (MPS). This guide covers the platform-specific considerations and optimizations available for non-NVIDIA hardware.
+## ROCm Support (AMD GPUs)
+### Overview
+ROCm (Radeon Open Compute) is AMD's open-source platform for GPU computing. LightDiffusion-Next automatically detects and utilizes ROCm-compatible AMD GPUs through PyTorch's HIP backend.
+### Supported Hardware
+- **RDNA Architecture:**
+  - RDNA 2 (RX 6000 series) - FP16 support
+  - RDNA 3 (RX 7000 series) - FP16 and BF16 support
+- **CDNA Architecture:**
+  - CDNA (MI100)
+  - CDNA 2 (MI200 series) - FP16 and BF16 support
+  - CDNA 3 (MI300 series) - FP16 and BF16 support
+### Installation
+1. **Install ROCm drivers and runtime:**
+   Follow the official [ROCm installation guide](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html) for your Linux distribution.
+```bash
+   # Example for Ubuntu 22.04
+   wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_latest_all.deb
+   sudo apt-get install ./amdgpu-install_latest_all.deb
+   sudo amdgpu-install --usecase=rocm
+```
+2. **Verify ROCm installation:**
+```bash
+   rocm-smi
+   /opt/rocm/bin/rocminfo
+```
+3. **Install PyTorch with ROCm support:**
+```bash
+   pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm6.0
+   ```bash
+   # Create virtual environment
+   python3 -m venv .venv
+   source .venv/bin/activate
+   pip install --upgrade pip uv
+   # Install PyTorch with ROCm 6.0 support (adjust version as needed)
+   uv pip install --index-url https://download.pytorch.org/whl/rocm6.0 torch torchvision
+   # Install project dependencies
+   uv pip install -r requirements.txt
+```
+4. **Launch LightDiffusion-Next:**
+```bash
+   streamlit run streamlit_app.py --server.address=0.0.0.0 --server.port=8501
+```
+### ROCm-Specific Features
+#### Automatic Detection
+LightDiffusion-Next automatically detects ROCm GPUs at startup and reports them in the logs:
+```
+Device: cuda:0 AMD Radeon RX 7900 XTX (ROCm) :
+```
+#### Memory Management
+- **Cache Management:** ROCm uses a more conservative cache clearing strategy compared to CUDA. Cache is only cleared when explicitly forced to prevent memory fragmentation issues.
+- **Memory Statistics:** Full memory statistics are available through the standard PyTorch CUDA API (which works transparently with ROCm).
+#### Precision Support
+- **FP16:** Fully supported on all RDNA and CDNA architectures
+- **BF16:** Supported on RDNA 3+ and CDNA 2+ GPUs (automatically detected)
+- **FP32:** Always available as fallback
+#### Attention Mechanisms
+| Feature | ROCm Support | Notes |
+|---------|--------------|-------|
+| PyTorch Scaled Dot-Product Attention (SDPA) | ✅ Yes | Default and recommended |
+| PyTorch Flash Attention | ✅ Yes | Available on RDNA 3 and CDNA 2+ |
+| xformers | ✅ Yes | Works with ROCm builds of xformers |
+| SageAttention | ❌ No | CUDA-only kernels |
+| SpargeAttn | ❌ No | CUDA-only kernels |
+**Recommendation:** Use PyTorch's built-in attention (SDPA) on ROCm for best compatibility. Install xformers ROCm build for additional optimizations.
+### Performance Tips
+1. **Use BF16 on supported GPUs:**
+   - RDNA 3 (RX 7000 series) and CDNA 2+ support BF16 natively
+   - BF16 provides better numerical stability than FP16
+2. **Enable PyTorch attention:**
+   - Automatically enabled for PyTorch 2.0+
+   - Provides good performance without CUDA-specific optimizations
+3. **Install ROCm-compatible xformers:**
+```bash
+   # Build xformers from source for ROCm
+   git clone https://github.com/facebookresearch/xformers.git
+   cd xformers
+   git submodule update --init --recursive
+   pip install -e . --no-build-isolation
+```
+4. **Monitor GPU utilization:**
+```bash
+   watch -n 1 rocm-smi
+```
+### Known Limitations
+- **SageAttention and SpargeAttn:** These optimizations use CUDA-specific kernels and are not available on ROCm. The system automatically falls back to PyTorch SDPA.
+- **Stable-Fast:** May have limited support depending on ROCm version. Test compilation before relying on it.
+- **Driver Maturity:** Ensure you're using the latest ROCm version for best stability and performance.
+---
+## Metal/MPS Support (Apple Silicon)
+### Overview
+Metal Performance Shaders (MPS) provides GPU acceleration on Apple Silicon Macs (M1, M2, M3 series). LightDiffusion-Next automatically detects and utilizes MPS when running on macOS.
+### Supported Hardware
+- **Apple Silicon:**
+  - M1, M1 Pro, M1 Max, M1 Ultra
+  - M2, M2 Pro, M2 Max, M2 Ultra
+  - M3, M3 Pro, M3 Max
+  - All future M-series chips
+### Installation
+1. **Ensure macOS is up to date:**
+   - macOS 12.3 (Monterey) or later required
+   - macOS 13+ (Ventura) recommended for best performance
+2. **Install Python 3.10:**
+```bash
+   # Using Homebrew
+   brew install python@3.10
+```
+3. **Create virtual environment and install dependencies:**
+```bash
+   python3.10 -m venv .venv
+   source .venv/bin/activate
+   pip install --upgrade pip
+   # Install PyTorch with MPS support
+   pip install torch torchvision torchaudio
+   # Install project dependencies
+   pip install -r requirements.txt
+```
+4. **Launch LightDiffusion-Next:**
+```bash
+   streamlit run streamlit_app.py --server.address=0.0.0.0 --server.port=8501
+```
+### MPS-Specific Features
+#### Automatic Detection
+MPS is automatically detected and enabled on compatible hardware:
+```
+Device: mps
+VAE dtype: torch.float16
+Set vram state to: SHARED
+```
+#### Memory Management
+- **Unified Memory:** Apple Silicon uses unified memory shared between CPU and GPU
+- **VRAM State:** Automatically set to `SHARED` mode
+- **Cache Management:** Uses `torch.mps.empty_cache()` for memory cleanup
+#### Precision Support
+- **FP16:** Fully supported and recommended (default)
+- **FP32:** Supported but slower
+- **BF16:** Not supported on MPS backend
+#### Attention Mechanisms
+| Feature | MPS Support | Notes |
+|---------|-------------|-------|
+| PyTorch Scaled Dot-Product Attention (SDPA) | ✅ Yes | Default and recommended |
+| PyTorch Flash Attention | ❌ No | Not available on MPS |
+| xformers | ❌ No | MPS backend not supported |
+| SageAttention | ❌ No | CUDA/MPS incompatible |
+| SpargeAttn | ❌ No | CUDA-only kernels |
+**Recommendation:** Use PyTorch's built-in attention (SDPA) on MPS. It's well-optimized for Apple Silicon.
+### Performance Tips
+- **Use FP16 precision:**
+MPS works best with FP16
+Automatically enabled by LightDiffusion-Next
+- **Optimize batch sizes:**
+Start with smaller batch sizes and increase gradually
+Monitor memory usage through Activity Monitor
+- **Keep macOS updated:**
+Apple regularly improves MPS performance in system updates
+- **Close unnecessary applications:**
+Unified memory is shared with system processes
+Free up RAM for better GPU performance
+- **Monitor GPU usage:**
+```bash
+   # Use Activity Monitor -> GPU tab
+   # Or use powermetrics (requires sudo):
+   sudo powermetrics --samplers gpu_power -i 1000
+```
+### Known Limitations
+- **Non-blocking transfers:** Not supported; MPS operations are blocking
+- **Advanced optimizations:** SageAttention, SpargeAttn, and xformers are not available
+- **BF16:** Not supported on MPS backend
+- **Memory pressure:** System may swap under high memory load due to unified architecture
+### Unified Memory Considerations
+Apple Silicon's unified memory architecture means:
+- GPU and CPU share the same physical memory pool
+- Less memory copying between devices
+- System processes compete for the same memory
+- Available VRAM depends on total system RAM and current usage
+**Recommended RAM:**
+- 16 GB: SD1.5 models at moderate resolutions
+- 32 GB: Comfortable for most workflows including Flux (with quantization)
+- 64 GB+: Professional workflows with large batch sizes
+---
+## Comparison Table
+| Feature | NVIDIA (CUDA) | AMD (ROCm) | Apple (MPS) |
+|---------|---------------|------------|-------------|
+| FP16 | ✅ Full | ✅ Full | ✅ Full |
+| BF16 | ✅ Full | ✅ RDNA3+/CDNA2+ | ❌ No |
+| PyTorch SDPA | ✅ Yes | ✅ Yes | ✅ Yes |
+| Flash Attention | ✅ Yes | ✅ RDNA3+/CDNA2+ | ❌ No |
+| xformers | ✅ Yes | ✅ Build from source | ❌ No |
+| SageAttention | ✅ Yes | ❌ No | ❌ No |
+| SpargeAttn | ✅ Yes (CC 8.0-9.0) | ❌ No | ❌ No |
+| Stable-Fast | ✅ Yes | ⚠️ Limited | ❌ No |
+| Memory Management | ✅ Dedicated VRAM | ✅ Dedicated VRAM | ⚠️ Unified Memory |
+---
+## Troubleshooting
+### ROCm Issues
+**Problem:** PyTorch doesn't detect ROCm GPU
+```bash
+# Check ROCm installation
+rocm-smi
+rocminfo | grep "Name:"
+# Verify PyTorch sees GPU
+python -c "import torch; print(torch.cuda.is_available()); print(torch.version.hip)"
+```
+**Problem:** Out of memory errors
+- Reduce batch size
+- Enable lower VRAM mode in settings
+- Close other GPU-using applications
+- Check with `rocm-smi` for memory usage
+**Problem:** Slow performance
+- Verify you're using the correct ROCm-optimized PyTorch build
+- Check GPU utilization with `rocm-smi`
+- Ensure FP16 or BF16 is enabled (check logs)
+### MPS Issues
+**Problem:** MPS not detected
+```bash
+# Verify MPS support
+python -c "import torch; print(torch.backends.mps.is_available())"
+```
+- Ensure macOS 12.3+
+- Update to latest macOS version
+- Reinstall PyTorch
+**Problem:** Memory warnings or crashes
+- Reduce batch size
+- Close other applications to free unified memory
+- Check Activity Monitor for memory pressure
+**Problem:** Slower than expected performance
+- Verify FP16 is being used (check logs)
+- Close background applications
+- Update to latest macOS version for performance improvements
+- Some models may be CPU-bound on older M1 chips
+---
+## Getting Help
+For platform-specific issues:
+1. Check the [FAQ](faq.md) for common questions
+2. Review PyTorch's platform-specific documentation:
+   - [ROCm installation](https://pytorch.org/get-started/locally/#linux-rocm)
+   - [MPS backend](https://pytorch.org/docs/stable/notes/mps.html)
+3. Open an issue on GitHub with:
+   - Platform details (GPU model, driver version, OS)
+   - LightDiffusion-Next startup logs
+   - Output of `python -c "import torch; print(torch.__version__); print(torch.version.hip if hasattr(torch.version, 'hip') else 'CUDA'); print(torch.cuda.is_available())"`
+---
+**Note:** This documentation reflects the current state of ROCm and MPS support in PyTorch and LightDiffusion-Next. As these platforms mature, more optimizations and features may become available.

docs/sageattention.md ADDED Viewed

	@@ -0,0 +1,338 @@

+# SageAttention & SpargeAttn
+## Overview
+SageAttention and SpargeAttn are drop-in replacements for PyTorch's scaled dot-product attention that can provide significant speedup with zero to minimal quality loss. They work by optimizing the compute-heavy attention mechanism used throughout diffusion models (UNet, VAE, Flux Transformers).
+- **SageAttention**: Uses INT8 quantization for key/value tensors while maintaining FP16 query precision
+- **SpargeAttn**: Adds dynamic sparsity pruning on top of SageAttention, skipping redundant attention computations
+Both are **training-free**, **hardware-accelerated** CUDA kernels that integrate transparently into LightDiffusion-Next.
+## How It Works
+### SageAttention
+Standard attention computes:
+$$
+\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V
+$$
+SageAttention accelerates this by:
+1. **Quantizing K and V** to INT8 before the matrix multiplication
+2. **Keeping Q in FP16** to preserve attention score precision
+3. **Fusing operations** (softmax, scaling, matmul) in hand-tuned CUDA kernels
+4. **Dequantizing** output back to FP16 after final matmul
+This reduces memory bandwidth (K/V use half the space) and leverages Tensor Cores more efficiently.
+### SpargeAttn
+SpargeAttn extends SageAttention with **sparse attention masking**:
+1. Computes a similarity metric between query and key patches
+2. Prunes attention connections below a learned threshold (default: 60% similarity)
+3. Applies cumulative distribution filtering to keep only the top 97% of attention scores
+4. Uses partial vector thresholding to skip redundant computations
+The result: 40-60% total speedup over baseline PyTorch attention with minimal impact on output quality.
+## Installation
+### SageAttention (All Platforms)
+**Prerequisites:**
+- CUDA Toolkit 11.8+ (must match your PyTorch CUDA version)
+- Python 3.8+
+- PyTorch with CUDA support
+**Install:**
+```bash
+# Clone repository
+git clone https://github.com/thu-ml/SageAttention
+cd SageAttention
+# Install from source (no build isolation to respect existing CUDA setup)
+pip install -e . --no-build-isolation
+# Verify installation
+python -c "import sageattention; print('SageAttention installed successfully')"
+```
+### SpargeAttn (Linux/WSL2 Only)
+**Prerequisites:**
+- Same as SageAttention
+- Linux or WSL2 environment (Windows native builds fail due to linker path limits)
+- GPU with compute capability 8.0-9.0 (RTX 30xx, 40xx, A100, H100)
+**Install:**
+```bash
+# Clone repository
+git clone https://github.com/thu-ml/SparseAttention
+cd SpargeAttn
+# Set GPU architecture (critical for performance)
+export TORCH_CUDA_ARCH_LIST="9.0"  # Or your GPU: 8.0, 8.6, 8.9, 9.0
+# Install from source
+pip install -e . --no-build-isolation
+# Verify installation
+python -c "import spas_sage_attn; print('SpargeAttn installed successfully')"
+```
+**GPU Architecture Reference:**
+| GPU Model | Compute Capability | TORCH_CUDA_ARCH_LIST |
+|-----------|-------------------|----------------------|
+| RTX 3060/3070/3080/3090 | 8.6 | `"8.6"` |
+| RTX 4060/4070/4080/4090 | 8.9 | `"8.9"` |
+| A100 | 8.0 | `"8.0"` |
+| H100 | 9.0 | `"9.0"` |
+| RTX 5060/5070/5080/5090 | 12.0 | SageAttention supported, SpargeAttn pending |
+### Docker Installation
+Both kernels are automatically built during the Docker image creation if the architecture is supported:
+```bash
+# Build with SpargeAttn (compute 8.0-9.0)
+docker-compose build --build-arg TORCH_CUDA_ARCH_LIST="8.9"
+# RTX 50xx builds (SageAttention only, no SpargeAttn yet)
+docker-compose build --build-arg TORCH_CUDA_ARCH_LIST="12.0"
+```
+## Usage
+### Automatic Detection
+LightDiffusion-Next automatically detects and enables the best available attention backend at startup:
+```python
+# Priority order (highest to lowest):
+SpargeAttn > SageAttention > xformers > PyTorch SDPA
+```
+Check which backend is active in the server logs:
+```bash
+# SpargeAttn enabled
+cat logs/server.log | grep "attention"
+# Output: Using SpargeAttn (Sparse + SageAttention) cross attention
+# SageAttention enabled
+# Output: Using SageAttention cross attention
+# Fallback
+# Output: Using pytorch cross attention
+```
+### Streamlit UI
+No configuration needed — SageAttention/SpargeAttn are always active if installed.
+### REST API
+Same as UI — the backend selection is transparent:
+```bash
+curl -X POST http://localhost:7861/api/generate \
+  -H "Content-Type: application/json" \
+  -d '{
+        "prompt": "a serene mountain lake at dawn",
+        "width": 768,
+        "height": 512,
+        "num_images": 1
+      }'
+# Automatically uses SpargeAttn if available
+```
+### Manual Disable
+Force PyTorch SDPA for debugging:
+```bash
+export LD_DISABLE_SAGE_ATTENTION=1
+python streamlit_app.py
+```
+## Performance
+Both SageAttention and SpargeAttn provide measurable speedup over PyTorch SDPA baseline:
+- **SageAttention**: Moderate speedup with zero quality loss (reported ~15-20% in papers)
+- **SpargeAttn**: Significant speedup with minimal quality loss (reported ~40-60% in papers)
+Actual performance gains vary based on:
+- GPU architecture and VRAM
+- Model type (SD1.5, SDXL, Flux)
+- Resolution and batch size
+- Head dimensions and sequence lengths
+**Note:** Benchmark your specific setup to measure real-world performance.## Technical Details
+### Head Dimension Support
+Both kernels natively support head dimensions of `[64, 96, 128]`. For other dimensions:
+- **< 64**: Pad to 64, compute, then slice result
+- **64-128**: Pad to 128, compute, then slice result
+- **> 128**: Fallback to xformers or PyTorch SDPA
+LightDiffusion-Next handles padding/slicing automatically.
+### Tensor Layout
+SageAttention expects tensors in `(batch_size, num_heads, seq_len, head_dim)` format. The pipeline reshapes inputs transparently:
+```python
+# Internal reshaping (handled automatically)
+q, k, v = map(
+    lambda t: t.reshape(b, -1, heads, dim_head).transpose(1, 2),
+    (q, k, v),
+)
+out = sageattention.sageattn(q, k, v, tensor_layout="HND")
+```
+### SpargeAttn Thresholds
+Default pruning parameters (tuned for quality/speed balance):
+```python
+out = spas_sage_attn.spas_sage2_attn_meansim_cuda(
+    q, k, v,
+    simthreshd1=0.6,      # Similarity threshold (60%)
+    cdfthreshd=0.97,      # Keep top 97% of attention scores
+    pvthreshd=15,         # Partial vector threshold
+    is_causal=False
+)
+```
+Adjust `simthreshd1` for different trade-offs:
+- `0.5`: More aggressive pruning, higher speedup, slight quality loss
+- `0.7`: Conservative pruning, lower speedup, minimal quality loss
+## Compatibility
+### Compatible With
+- ✅ Stable Diffusion 1.5
+- ✅ Stable Diffusion 2.1
+- ✅ SDXL
+- ✅ Flux (both cross-attention and self-attention blocks)
+- ✅ All samplers (Euler, DPM++, etc.)
+- ✅ LoRA adapters
+- ✅ Textual inversion embeddings
+- ✅ HiresFix, ADetailer, Img2Img
+- ✅ Stable-Fast (when stacked)
+- ✅ WaveSpeed caching (when stacked)
+### Known Limitations
+- ❌ RTX 50xx (compute 12.0) does not support SpargeAttn yet (SageAttention works)
+- ❌ CPU-only inference (CUDA required)
+- ❌ AMD GPUs (ROCm port not available)
+- ⚠️ Head dimensions > 128 fall back to slower backends
+## Troubleshooting
+### Import Error: `No module named 'sageattention'`
+**Cause:** Not installed or installation failed.
+**Fix:**
+```bash
+cd SageAttention
+pip install -e . --no-build-isolation
+```
+Verify CUDA toolkit is accessible:
+```bash
+nvcc --version  # Should match PyTorch CUDA version
+```
+### Compilation Error: `nvcc fatal error`
+**Cause:** CUDA toolkit not found or version mismatch.
+**Fix:**
+1. Install CUDA toolkit matching your PyTorch version
+2. Add CUDA to PATH:
+   ```bash
+   export PATH=/usr/local/cuda/bin:$PATH
+   export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
+   ```
+3. Reinstall SageAttention
+### SpargeAttn Build Fails on Windows
+**Cause:** Windows linker has path length limitations.
+**Fix:** Use WSL2 or native Linux:
+```bash
+# In WSL2
+cd SpargeAttn
+export TORCH_CUDA_ARCH_LIST="8.9"
+pip install -e . --no-build-isolation
+```
+### Slower Than Expected
+**Cause:** Wrong GPU architecture compiled or kernel fallback.
+**Fix:**
+1. Check logs for "Using pytorch cross attention" (fallback indicator)
+2. Rebuild with correct `TORCH_CUDA_ARCH_LIST`
+3. Verify GPU compute capability:
+   ```bash
+   nvidia-smi --query-gpu=compute_cap --format=csv
+   ```
+### Quality Degradation with SpargeAttn
+**Cause:** Pruning thresholds too aggressive.
+**Fix:** Currently not user-configurable in the UI, but you can modify `src/Attention/AttentionMethods.py`:
+```python
+# Line ~290
+out = spas_sage_attn.spas_sage2_attn_meansim_cuda(
+    q, k, v,
+    simthreshd1=0.7,      # Increase from 0.6 for better quality
+    cdfthreshd=0.98,      # Increase from 0.97
+    pvthreshd=15,
+    is_causal=False
+)
+```
+## Citation
+If you use SageAttention or SpargeAttn in your work:
+```bibtex
+@article{sageattention2024,
+  title={SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration},
+  author={Zhang, Jintao and Zhang, Jia and Zhai, Pengle and others},
+  journal={arXiv preprint arXiv:2410.02367},
+  year={2024}
+}
+@article{spargeattn2024,
+  title={SpargeAttn: Sparsity-Aware Efficient Attention for Long Context LLMs},
+  author={Zhang, Jintao and others},
+  journal={arXiv preprint},
+  year={2024}
+}
+```
+## Resources
+- [SageAttention Repository](https://github.com/thu-ml/SageAttention)
+- [SpargeAttn Repository](https://github.com/thu-ml/SparseAttention)
+- [SageAttention Paper](https://arxiv.org/abs/2410.02367)
+- [Flash Attention](https://github.com/Dao-AILab/flash-attention) (related work)

docs/stablefast.md ADDED Viewed

	@@ -0,0 +1,412 @@

+# Stable-Fast Compilation
+## Overview
+Stable-Fast is a JIT compilation framework that optimizes Stable Diffusion UNet models by tracing execution, fusing operators and optionally capturing CUDA graphs. It can provide significant speedup for SD1.5/SDXL batch workflows with zero quality loss.
+Unlike runtime attention optimizations (SageAttention, SpargeAttn), Stable-Fast performs **ahead-of-time compilation** on the first inference pass. The compiled model is cached and reused for subsequent generations with compatible shapes.
+## How It Works
+Stable-Fast applies three optimization layers:
+### 1. TorchScript Tracing
+The first forward pass through the UNet is recorded into a static computational graph:
+```python
+traced_model = torch.jit.trace(unet, example_inputs)
+```
+This eliminates Python interpreter overhead and enables downstream graph optimizations.
+### 2. Operator Fusion
+The traced graph undergoes pattern-based fusion:
+- **Conv + BatchNorm fusion**: Merges normalization into convolution weights
+- **Activation fusion**: Fuses ReLU/GELU/SiLU directly into linear/conv ops
+- **Memory layout optimization**: Converts to channels-last format for faster conv execution
+- **Triton kernels**: Replaces PyTorch ops with hand-tuned Triton implementations (if `enable_triton=True`)
+Example fusion:
+```python
+# Before:
+x = conv(input)
+x = batch_norm(x)
+x = relu(x)
+# After:
+x = fused_conv_bn_relu(input)  # Single kernel launch
+```
+### 3. CUDA Graph Capture (Optional)
+When `enable_cuda_graph=True`, the entire forward pass is captured as a static CUDA graph:
+- Kernel launches are recorded once and replayed on subsequent runs
+- Eliminates CPU launch overhead (~10-15% speedup)
+- Requires fixed input shapes and batch sizes
+**Trade-off:** Higher VRAM usage (~500MB for graph buffers) and less flexibility.
+## Installation
+### Windows/Linux (Manual)
+Follow the [official guide](https://github.com/chengzeyi/stable-fast?tab=readme-ov-file#installation):
+```bash
+# Install from PyPI (recommended)
+pip install stable-fast
+# Or build from source for latest features
+git clone https://github.com/chengzeyi/stable-fast
+cd stable-fast
+pip install -e .
+```
+**Prerequisites:**
+- PyTorch 2.0+ with CUDA support
+- xformers (optional but recommended)
+- Triton (optional for Triton kernel fusion)
+### Docker
+Stable-Fast is included in the Docker image when `INSTALL_STABLE_FAST=1`:
+```bash
+docker-compose build --build-arg INSTALL_STABLE_FAST=1
+```
+Default is `0` (disabled) to reduce image size and build time.
+## Usage
+### Streamlit UI
+Enable in the **Performance** section of the sidebar:
+1. Check **Stable Fast**
+2. Generate images — the first run compiles the model (30-60s delay)
+3. Subsequent generations reuse the cached compiled model
+**Visual indicator:** The first generation shows "Compiling model..." in the progress bar.
+### REST API
+Pass `stable_fast: true` in the request payload:
+```bash
+curl -X POST http://localhost:7861/api/generate \
+  -H "Content-Type: application/json" \
+  -d '{
+        "prompt": "a peaceful garden with cherry blossoms",
+        "width": 768,
+        "height": 512,
+        "num_images": 1,
+        "stable_fast": true
+      }'
+```
+### Configuration
+Stable-Fast behavior is controlled by `CompilationConfig`:
+```python
+from sfast.compilers.diffusion_pipeline_compiler import CompilationConfig
+config = CompilationConfig.Default()
+config.enable_xformers = True           # Use xformers attention
+config.enable_cuda_graph = False        # CUDA graphs (set True for max speed)
+config.enable_jit_freeze = True         # Freeze traced graph
+config.enable_cnn_optimization = True   # Conv fusion
+config.enable_triton = False            # Triton kernels (experimental)
+config.memory_format = torch.channels_last  # Optimize memory layout
+```
+LightDiffusion-Next uses sensible defaults (CUDA graphs disabled by default for flexibility). To override:
+```python
+# In src/StableFast/StableFast.py
+def gen_stable_fast_config(enable_cuda_graph=False):
+    config = CompilationConfig.Default()
+    config.enable_cuda_graph = enable_cuda_graph  # Pass True for max speed
+    # ... rest of config
+```
+## Performance
+### Speedup Benchmarks
+Stable-Fast provides speedup through:
+- **JIT compilation**: Eliminates Python overhead
+- **Operator fusion**: Reduces kernel launches
+- **CUDA graphs** (optional): Further reduces CPU overhead
+Speedup varies significantly based on:
+- GPU architecture
+- Batch size and generation count
+- Model size (SD1.5 vs SDXL)
+- Whether CUDA graphs are enabled
+**Note:** Performance benefits are most noticeable for batch operations (50+ images). For single 20-step generations, compilation overhead may exceed speedup gains.
+### Compilation Time
+First-run compilation overhead:
+- **SD1.5 UNet**: ~30s (traced once per resolution/batch size)
+- **SDXL UNet**: ~60s (larger model)
+- **Subsequent runs**: <1s (cached)
+Cached compiled models persist in `~/.cache/torch_extensions/`. Clear this directory to force recompilation.
+## Stacking with Other Optimizations
+Stable-Fast is **fully compatible** with SageAttention, SpargeAttn and WaveSpeed:
+### Stable-Fast + SageAttention
+```yaml
+stable_fast: true
+# SageAttention auto-detected
+```
+**Result:** 70% (Stable-Fast) + 15% (SageAttention) = **~2x total speedup**
+### Stable-Fast + SpargeAttn
+```yaml
+stable_fast: true
+# SpargeAttn auto-detected
+```
+**Result:** 70% (Stable-Fast) + 40% (SpargeAttn) = **~2.4x total speedup**
+### Stable-Fast + SpargeAttn + DeepCache
+```yaml
+stable_fast: true
+deepcache:
+  enabled: true
+  interval: 3
+  depth: 2
+# SpargeAttn auto-detected
+```
+**Result:** 70% × 40% × 150% (DeepCache 2-3x) = **~4-5x total speedup**
+## Compatibility
+### Compatible With
+- ✅ Stable Diffusion 1.5
+- ✅ Stable Diffusion 2.1
+- ✅ SDXL
+- ✅ All samplers (Euler, DPM++, etc.)
+- ✅ LoRA adapters
+- ✅ Textual inversion embeddings
+- ✅ HiresFix
+- ✅ ADetailer
+- ✅ Img2Img (with fixed denoise strength)
+- ✅ SageAttention/SpargeAttn
+- ✅ WaveSpeed caching
+### Not Compatible With
+- ❌ Flux models (different architecture, no UNet)
+- ❌ Dynamic resolution changes after compilation
+- ❌ Dynamic batch size changes after compilation (with CUDA graphs)
+- ⚠️ Frequent model switching (recompiles each time)
+## Troubleshooting
+### Slow First Run / Repeated Recompilation
+**Symptom:** Every generation triggers compilation, even with identical settings.
+**Causes:**
+1. Cache directory not writable
+2. System clock incorrect (invalidates timestamps)
+3. Different model loaded (each model is cached separately)
+**Fixes:**
+```bash
+# Check cache permissions
+ls -la ~/.cache/torch_extensions
+# Ensure stable timestamps
+date  # Should be correct
+# Mount cache in Docker to persist across container restarts
+docker run -v ~/.cache/torch_extensions:/root/.cache/torch_extensions ...
+```
+### CUDA Out of Memory During Compilation
+**Symptom:** OOM error on first run but not subsequent runs.
+**Cause:** Compilation allocates temporary buffers for tracing.
+**Fixes:**
+1. Disable CUDA graphs: `enable_cuda_graph=False` (saves ~500MB)
+2. Reduce batch size temporarily for first run
+3. Clear other VRAM consumers (close other apps, disable model caching)
+### Compilation Hangs or Crashes
+**Symptom:** Process freezes during "Compiling model..." step.
+**Causes:**
+1. Triton compilation error (if `enable_triton=True`)
+2. Driver incompatibility
+3. Insufficient CPU RAM for graph analysis
+**Fixes:**
+```bash
+# Disable Triton
+# In src/StableFast/StableFast.py:
+config.enable_triton = False
+# Update NVIDIA driver
+nvidia-smi  # Check version, upgrade if < 525.x
+# Increase Docker memory limit
+# In docker-compose.yml:
+deploy:
+  resources:
+    limits:
+      memory: 16G  # Increase from default
+```
+### Error: `torch.jit.trace` fails
+**Symptom:** `RuntimeError: Could not trace model`
+**Cause:** Dynamic control flow in model (if/else statements depending on runtime values).
+**Fix:** This is rare with standard SD models. If it occurs:
+1. Check for custom LoRA/embeddings with dynamic logic
+2. Disable Stable-Fast for that specific generation
+3. Report issue with model details
+### Model Quality Degradation
+**Symptom:** Compiled model produces different outputs than baseline.
+**Cause:** Numeric precision differences from operator fusion (very rare).
+**Fixes:**
+```python
+# Disable aggressive optimizations
+config.enable_cnn_optimization = False
+config.memory_format = None  # Use default layout
+```
+If issue persists, disable Stable-Fast and file a bug report.
+## Advanced Configuration
+### Custom Compilation Config
+Override defaults in `src/StableFast/StableFast.py`:
+```python
+def gen_stable_fast_config(enable_cuda_graph=False):
+    config = CompilationConfig.Default()
+    # Maximum speed (higher VRAM usage)
+    config.enable_cuda_graph = True
+    config.enable_triton = True
+    config.prefer_lowp_gemm = True  # Use FP16 matrix multiplies
+    # Balanced (recommended)
+    config.enable_cuda_graph = False
+    config.enable_triton = False
+    config.enable_cnn_optimization = True
+    # Debug (no optimizations)
+    config.enable_cuda_graph = False
+    config.enable_jit_freeze = False
+    config.enable_cnn_optimization = False
+    return config
+```
+### Clear Cached Compilations
+```bash
+# Linux/Mac
+rm -rf ~/.cache/torch_extensions
+# Windows
+del /s /q %USERPROFILE%\.cache\torch_extensions
+# Docker (mount cache as volume)
+docker run -v my_cache:/root/.cache/torch_extensions ...
+docker volume rm my_cache  # Clear cache
+```
+### Profile Compilation
+```bash
+# Enable debug logging
+export LD_SERVER_LOGLEVEL=DEBUG
+# Run generation and check logs
+cat logs/server.log | grep "Stable"
+```
+## Best Practices
+### Production Deployments
+1. **Pre-compile models** during startup with a warm-up request (only for batch/long-running services)
+2. **Mount cache volume** to persist compilations across container restarts
+3. **Disable CUDA graphs** if serving multiple batch sizes
+4. **Enable CUDA graphs** for fixed-resolution APIs with consistent high-volume traffic
+5. **Disable Stable-Fast entirely** for single-shot API endpoints (compilation overhead exceeds benefit)
+Example warm-up:
+```python
+# In startup script
+def warmup_stable_fast(model, width=768, height=512):
+    """Pre-compile model with dummy input."""
+    dummy_input = torch.randn(1, 4, height // 8, width // 8, device="cuda")
+    dummy_timestep = torch.tensor([999], device="cuda")
+    with torch.no_grad():
+        model(dummy_input, dummy_timestep, c={})
+    print("Stable-Fast compilation complete")
+```
+### Development Workflows
+1. **Disable Stable-Fast** when experimenting with new models/LoRAs (avoids repeated recompilation)
+2. **Enable for final testing** to verify production performance
+3. **Clear cache** after upgrading PyTorch/CUDA drivers
+## Citation
+If you use Stable-Fast in your work:
+```bibtex
+@misc{stable-fast,
+  author = {Cheng Zeyi},
+  title = {stable-fast: Fast Inference for Stable Diffusion},
+  year = {2023},
+  publisher = {GitHub},
+  url = {https://github.com/chengzeyi/stable-fast}
+}
+```
+## Resources
+- [Stable-Fast Repository](https://github.com/chengzeyi/stable-fast)
+- [Installation Guide](https://github.com/chengzeyi/stable-fast?tab=readme-ov-file#installation)
+- [TorchScript Documentation](https://pytorch.org/docs/stable/jit.html)
+- [CUDA Graphs Guide](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/)

docs/tome.md ADDED Viewed

	@@ -0,0 +1,272 @@

+# Token Merging (ToMe)
+## Overview
+Token Merging (ToMe) is a **performance optimization** that accelerates diffusion models by intelligently merging similar tokens in the attention mechanism. By identifying and combining redundant computations, ToMe achieves **20-60% speedup** with minimal quality impact.
+Unlike feature caching (DeepCache, WaveSpeed), ToMe reduces the computational graph itself — fewer tokens means fewer attention operations, less memory bandwidth, and faster generation.
+This is a **training-free**, **drop-in optimization** that works with all Stable Diffusion models (SD1.5, SDXL) and can be combined with other speedup techniques.
+## How It Works
+### The Token Redundancy Problem
+Diffusion models process images as sequences of tokens (patches):
+```
+Input Image (512×512) → Tokenize → 4096 tokens (64×64 grid of 8×8 patches)
+```
+At each attention layer, **every token attends to every other token**:
+$$
+\text{Attention Cost} = O(N^2 \cdot D)
+$$
+Where:
+- $N$ = number of tokens (e.g., 4096 for 512×512)
+- $D$ = embedding dimension (e.g., 768 or 1024)
+**Key insight:** Many tokens are highly similar (e.g., sky regions, uniform backgrounds, smooth gradients). Computing attention between nearly-identical tokens is redundant.
+### The ToMe Solution
+Token Merging reduces redundancy through **bipartite matching**:
+```
+Step 1: Split tokens into two sets
+┌─────────────────────┬─────────────────────┐
+│ Destination Set (dst)│ Source Set (src)    │
+│ [Token 1, 3, 5, ...] │ [Token 2, 4, 6, ...] │
+└─────────────────────┴─────────────────────┘
+Step 2: Compute similarity (cosine distance)
+   dst[0] ↔ src[0]: 0.92  (highly similar!)
+   dst[0] ↔ src[1]: 0.34
+   dst[0] ↔ src[2]: 0.18
+   ...
+Step 3: Merge most similar pairs
+   merged_token[0] = (dst[0] + src[0]) / 2
+Step 4: Continue with fewer tokens
+   4096 tokens → 2048 tokens (50% merge ratio)
+   Attention cost reduced by ~4x
+```
+This happens **per attention layer**, with merge ratio dynamically adjusting based on layer depth.
+## Configuration
+### Parameters
+| Parameter | Type | Default | Range | Description |
+|-----------|------|---------|-------|-------------|
+| `tome_enabled` | bool | `False` | - | Enable Token Merging |
+| `tome_ratio` | float | `0.5` | 0.0-0.9 | Percentage of tokens to merge (higher = faster, lower quality) |
+| `tome_max_downsample` | int | `1` | 1, 2, 4, 8 | Apply ToMe to layers with downsampling ≤ this value |
+### Choosing `tome_max_downsample`
+Controls which UNet layers apply ToMe:
+| Value | Layers Affected | Speed vs Quality |
+|-------|----------------|------------------|
+| **1** | Only full-resolution layers (4/15) | Conservative, minimal quality impact |
+| **2** | Half-resolution layers (8/15) | Balanced (recommended) |
+| **4** | Quarter-resolution layers (12/15) | Aggressive |
+| **8** | All layers (15/15) | Maximum speedup, noticeable quality loss |
+**Recommendation:** Start with `max_downsample=1`. Only increase if you need more speedup and can tolerate quality reduction.
+## Usage
+### Streamlit UI
+Enable in the **🔀 Token Merging (ToMe)** expander:
+1. Check **Enable Token Merging**
+2. Select a preset:
+   - **Conservative** — 30% merge, max_downsample=2 (minimal impact)
+   - **Balanced** — 50% merge, max_downsample=1 (recommended)
+   - **Aggressive** — 70% merge, max_downsample=1 (maximum speed)
+   - **Custom** — Manual slider control
+3. Generate images — console confirms activation
+**Visual feedback:**
+```
+✓ Token Merging ACTIVE: 50% merge ratio, max_downsample=1
+```
+### REST API
+Include in your generation request:
+```bash
+curl -X POST http://localhost:7861/api/generate \
+  -H "Content-Type: application/json" \
+  -d '{
+        "prompt": "a cyberpunk cityscape at night, neon lights",
+        "width": 1024,
+        "height": 512,
+        "steps": 25,
+        "tome_enabled": true,
+        "tome_ratio": 0.5,
+        "tome_max_downsample": 1
+      }'
+```
+### Python API
+```python
+from src.user.pipeline import pipeline
+pipeline(
+    prompt="a detailed fantasy castle on a cliff",
+    w=768,
+    h=1024,
+    steps=30,
+    sampler="dpmpp_sde_cfgpp",
+    scheduler="ays",
+    tome_enabled=True,
+    tome_ratio=0.5,
+    tome_max_downsample=1,
+    number=4  # Generate multiple images faster
+)
+```
+## Troubleshooting
+### "No speedup detected"
+**Possible causes:**
+1. **tomesd not installed** — Install with `pip install tomesd`
+2. **Other bottlenecks** — Enable only ToMe for isolated testing
+3. **Very low resolution** — ToMe benefits are minimal below 512px
+**Solutions:**
+```bash
+# Check installation
+python -c "import tomesd; print('ToMe available')"
+# Test in isolation at 1024×512 (ideal resolution for ToMe)
+python quick_tome_test.py
+```
+### "Images look blurry or soft"
+**Cause:** `tome_ratio` too high (>0.6) or `max_downsample` too aggressive (>2).
+**Solutions:**
+- Reduce `tome_ratio` to 0.4-0.5
+- Lower `max_downsample` to 1
+- Increase `steps` to 30-35 for better convergence
+- Disable ToMe for final high-quality renders
+### "Minimal speedup despite 70% merge"
+**Cause:** Other optimizations (DeepCache, Multi-Scale) already bottlenecked elsewhere (VAE decode, sampling overhead).
+**Solutions:**
+- Profile with isolated tests (disable all other optimizations)
+- Ensure GPU isn't memory-bound (reduce batch size)
+- Check system monitoring for CPU/disk bottlenecks
+### "Model fails to load / tomesd errors"
+**Cause:** Outdated tomesd version or incompatible model architecture.
+**Solutions:**
+```bash
+# Update tomesd
+pip install --upgrade tomesd
+# Check compatibility (ToMe only works with UNet-based models)
+# Flux/Transformer models require different ToMe variant (not yet supported)
+```
+## Technical Details
+### Implementation
+ToMe is applied via the `ModelPatcher` class (`src/Model/ModelPatcher.py`):
+```python
+def apply_tome(self, ratio: float = 0.5, max_downsample: int = 1) -> bool:
+    """Apply Token Merging to the diffusion model."""
+    # Remove any existing patch (handles cached models)
+    try:
+        tomesd.remove_patch(self)
+    except:
+        pass
+    # Apply ToMe patch
+    tomesd.apply_patch(
+        self,  # ModelPatcher with .model.diffusion_model structure
+        ratio=ratio,
+        max_downsample=max_downsample
+    )
+    self.tome_enabled = True
+    return True
+```
+**Cache handling:** ToMe patches are removed after each generation and re-applied as needed, ensuring correct behavior with model caching.
+### Bipartite Matching Algorithm
+ToMe uses **proportional attention-based matching**:
+1. **Partition tokens:**
+   $$
+   T_{\text{dst}}, T_{\text{src}} = \text{partition}(T, \text{stride}=(2,2))
+   $$
+2. **Compute similarity matrix:**
+   $$
+   S_{ij} = \frac{T_{\text{dst}}[i] \cdot T_{\text{src}}[j]}{||T_{\text{dst}}[i]|| \cdot ||T_{\text{src}}[j]||}
+   $$
+3. **Find top-k matches:**
+   $$
+   k = \lfloor \text{ratio} \times |T_{\text{src}}| \rfloor
+   $$
+4. **Merge tokens:**
+   $$
+   T'[i] = \frac{T_{\text{dst}}[i] + T_{\text{src}}[\text{match}(i)]}{2}
+   $$
+## Compatibility
+| Feature | Compatible? | Notes |
+|---------|-------------|-------|
+| **SD1.5 models** | ✓ | Full support, tested extensively |
+| **SDXL models** | ✓ | Full support, larger speedup |
+| **Flux models** | ✗ | UNet-specific, Transformer variant TBD |
+| **All samplers** | ✓ | ToMe patches attention, agnostic to sampler |
+| **CFG-Free** | ✓ | No interaction, both apply independently |
+| **DeepCache** | ✓ | Excellent combination, speedups multiply |
+| **Multi-Scale** | ✓ | Compatible, benefits stack |
+| **HiRes Fix** | ✓ | Applied to all upscaling passes |
+| **ADetailer** | ✓ | Applied to detail-enhancement passes |
+| **Stable-Fast** | ✓ | Can combine for maximum speedup |
+## Limitations
+1. **UNet-only:** Transformer architectures (Flux) use different attention patterns — dedicated Transformer-ToMe needed
+2. **Detail sensitivity:** High-frequency textures (fabric weave, individual hairs) see most quality impact
+3. **Diminishing returns:** Beyond 60% merge, quality degrades faster than speed improves
+4. **One-time patch:** Doesn't adapt merge ratio dynamically during generation
+## Related Optimizations
+- **[DeepCache](wavespeed.md#deepcache)**: Feature caching — complements ToMe, speedups multiply (~2.8x combined)
+- **[Multi-Scale Diffusion](optimizations.md#multi-scale)**: Resolution-based optimization — also reduces token count
+- **[Stable-Fast](stablefast.md)**: Compilation-based speedup — can combine for maximum performance
+## References & Further Reading
+- **Original Paper:** [Token Merging for Fast Stable Diffusion](https://arxiv.org/abs/2303.17604) (Bolya & Hoffman, 2023)
+- **tomesd Library:** https://github.com/dbolya/tomesd
+- **ToMe for Vision Transformers:** https://github.com/facebookresearch/ToMe

docs/usage.md ADDED Viewed

	@@ -0,0 +1,134 @@

+# Usage
+# First Run & UI Tour
+This page walks you through launching LightDiffusion-Next, understanding the Streamlit layout, using the optional Gradio UI and triggering a first generation from the command line.
+## Launching the Streamlit UI
+- **Windows:** run `run.bat` (see [Installation](installation.md)).
+- **Linux/macOS/WSL2:** activate your virtual environment and run `streamlit run streamlit_app.py --server.port=8501`.
+- **Docker:** start the compose stack and open `http://localhost:8501`.
+You will see an initialization progress indicator while checkpoints and auxiliary models are downloaded. Once complete the app switches to a two-tab layout: **🎨 Generate** and **📜 History**.
+## Generate tab
+The Generate tab is designed as a control surface where the left sidebar contains parameters and the right canvas displays previews and final renders.
+### Prompt & base settings
+- **Prompt / Negative prompt** — text areas at the top of the sidebar. Negative prompts are optional; the pipeline automatically falls back to a curated default containing `EasyNegative`, `badhandv4`, `lr` and `ng_deepnegative` embeddings.
+- **Dimensions** — width/height sliders (64–2048) with automatic aspect handling in the gallery.
+- **Images & batch** — request multiple images per job; large requests may be chunked server-side into groups no larger than `LD_MAX_IMAGES_PER_GROUP` images (default: 256) to avoid memory and disk pressure. Use the `batch_size` setting to control internal sampler batch size and adjust `LD_MAX_IMAGES_PER_GROUP` via environment variables if necessary.
+### Feature toggles
+- **HiRes Fix** — Upscales the latent and runs an extra sampling pass. Generates output in `output/HiresFix`.
+- **ADetailer** — Uses SAM + YOLO and Impact Pack prompt heads to redraw faces/bodies. Additional artifacts are saved to `output/Adetailer`.
+- **Enhance prompt** — Sends your prompt through the Ollama model specified by `PROMPT_ENHANCER_MODEL` (defaults to `qwen3:0.6b`). The rewritten prompt is shown in the sidebar and in image metadata.
+- **Stable-Fast** — Enables UNet compilation (after the first warm-up) for faster iterations.
+- **Flux mode** — Routes the job through the quantized Flux pipeline (requires the `ae.safetensors` VAE and quantized GGUF weights downloaded via `CheckAndDownloadFlux`).
+- **Img2Img mode** — Reveals an image uploader. The selected picture is used as the source latent, optionally combined with UltimateSDUpscale.
+- **Keep models in VRAM** — Toggle model caching between jobs to reduce load time at the cost of VRAM retention.
+- **Real-time preview** — Streams TAESD previews into a responsive gallery while sampling is still running. Disable it when running headless to save resources.
+### Sampling & Scheduling
+The **⚡ Sampling & Scheduling** section provides direct control over the sampling process:
+- **Scheduler** — Choose from 8 scheduler options including the new **AYS (Align Your Steps)** schedulers which provide ~2x speedup by using optimized sigma distributions. Options include:
+  - Normal, Karras, Simple, Beta (traditional schedulers)
+  - AYS, AYS SD1.5, AYS SDXL, AYS Flux (optimized schedulers)
+- **Sampler** — Select from 6 available samplers:
+  - Standard: Euler, Euler Ancestral
+  - CFG++ variants: Euler CFG++, Euler Ancestral CFG++, DPM++ 2M CFG++, DPM++ SDE CFG++
+- **Steps** — Adjust sampling steps (1-150). The UI shows recommendations based on your scheduler choice (e.g., 10 steps for AYS vs 20 for normal).
+- **Prompt Cache** — Toggle prompt caching on/off (enabled by default). View cache statistics showing hits/misses and clear the cache when needed.
+### Multi-scale diffusion presets
+Under the “Multi-Scale Diffusion Settings” accordion you can:
+- Choose a preset (`quality`, `performance`, `balanced`, `disabled`).
+- Override the scale factor and the number of steps to run at full resolution.
+- Enable intermittent full-resolution refinement.
+Multi-scale diffusion provides major frame-time savings at high resolutions and is enabled by default.
+### Model cache management
+- **🔍 Check VRAM Usage** — reports total/used/free VRAM, cached checkpoints and whether the “keep loaded” flag is active.
+- **🗑️ Clear Model Cache** — evicts models from VRAM so the next job reloads everything fresh.
+### Status & previews
+- A status bar at the bottom of the page surfaces timing, generation stage and any warnings.
+- When real-time preview is enabled, the canvas shows the six most recent TAESD frames. They disappear automatically when generation completes.
+## Keyboard shortcuts & session state
+- Most sliders support arrow-key and shift + arrow adjustments.
+- The UI remembers your last-used settings inside `webui_settings.json`. Toggle “Verbose mode” in the settings drawer to see more runtime information.
+- Seeds are stored in `include/last_seed.txt`. Enable “Reuse seed” to repeat a composition.
+## History tab
+- Displays every PNG in the `output/**` tree with metadata overlays (timestamp, dimensions, prompt).
+- Use “🔄 Refresh History” to rescan the folders, “🗑️ Delete Selected Image” for targeted cleanup or “⚠️ Clear All Images” to wipe everything.
+- Selections show exact file paths so you can open them in external editors.
+## Using the Gradio UI
+Run `python app.py` (or set `UI_FRAMEWORK=gradio` in Docker) to launch the Gradio frontend at `http://localhost:7860`.
+- The controls mirror the Streamlit sidebar but the layout is optimized for Hugging Face Spaces.
+- Live previews stream directly to the main gallery while jobs run.
+- The 📸 Image History tab reads from the same `output/` folders as Streamlit, so both UIs share artifacts and metadata.
+## Command-line pipeline
+You can invoke the pipeline without any UI for scripted jobs.
+```bash
+python -m src.user.pipeline "a futuristic city at dusk" 768 512 2 2 --hires-fix --adetailer --stable-fast --reuse-seed
+```
+- Positional arguments: `prompt width height number batch`.
+- Flags mirror the UI toggles (`--img2img`, `--flux`, `--prio-speed`, `--multiscale-preset`, etc.).
+- Img2Img uses the prompt as a filesystem path unless you pass `--img2img-image` through the FastAPI server (see [REST & automation](api.md)).
+## Streamlit tips
+- Click “Retry Initialization” if the download step fails — the app reruns `CheckAndDownload()`.
+- Use the sidebar menu → **Rerun** if you change source code while developing custom nodes.
+- When running on laptops, disable “Keep models in VRAM” before closing the UI to release GPU memory for other applications.
+## Programmatic pipeline usage (Python)
+You can import and call the pipeline directly from Python. The function lives at `src.user.pipeline.pipeline` and accepts the same runtime flags as the CLI. The example below shows a minimal, synchronous call that runs the pipeline and handles the returned mapping when running in batched mode.
+```python
+from src.user.pipeline import pipeline
+result = pipeline(
+    prompt=["a futuristic city at dusk", "a cyberpunk alley, rainy"],
+    w=768,
+    h=512,
+    number=2,
+    batch=2,
+    hires_fix=False,
+    adetailer=False,
+    stable_fast=False,
+    reuse_seed=False,
+    flux_enabled=False,
+)
+# When run in batched mode `pipeline` returns a dict with key 'batched_results'
+if isinstance(result, dict) and "batched_results" in result:
+    for req_id, entries in result["batched_results"].items():
+        print(f"Request {req_id} produced {len(entries)} artifacts")
+else:
+    print("Pipeline completed; check output/ for generated images")
+```

docs/wavespeed.md ADDED Viewed

	@@ -0,0 +1,473 @@

+# WaveSpeed Caching
+## Overview
+WaveSpeed is the project's caching-oriented optimization layer for reusing work across denoising steps. In the current codebase, the integrated path is DeepCache for UNet-based models, and the repository also contains groundwork for a Flux-oriented First Block Cache path.
+LightDiffusion-Next contains two WaveSpeed-related implementations:
+1. **DeepCache** — Integrated for UNet-based models (SD1.5, SDXL)
+2. **First Block Cache (FBCache)** — Flux-oriented cache machinery present in the codebase
+Both are training-free. DeepCache is the user-facing path today; First Block Cache is codebase groundwork for a more specialized transformer caching path.
+## How It Works
+### Core Insight
+Diffusion models denoise images iteratively over 20-50 steps. Researchers observed that:
+- **High-level features** (semantic structure, composition) change slowly across steps
+- **Low-level features** (fine details, textures) require frequent updates
+WaveSpeed aims to reduce repeated computation across nearby denoising steps by reusing information from earlier steps where practical.
+### DeepCache (UNet Models) {#deepcache}
+DeepCache is the integrated WaveSpeed path for UNet models.
+**Cache step (every N steps):**
+1. Run the full denoiser path
+2. Store the output for later reuse
+**Reuse step (intermediate steps):**
+1. Reuse the cached denoiser output
+2. Skip the full model recomputation for that step
+**Speedup:** ~50-70% time saved per reuse step → 2-3x total speedup with `interval=3`
+### First Block Cache (Flux Models)
+Flux uses Transformer blocks instead of UNet convolutions. The repository includes a First Block Cache implementation for this architecture family:
+```
+┌─────────────────────────────────────────┐
+│ First Transformer Block (always run)    │ ← Computes initial features
+├─────────────────────────────────────────┤
+│ Remaining Blocks (cached if similar)    │ ← FBCache caching zone
+└─────────────────────────────────────────┘
+```
+**Cache decision logic:**
+1. Run first Transformer block
+2. Compare output to previous step's output
+3. If difference < threshold: reuse cached remaining blocks
+4. If difference ≥ threshold: run all blocks and update cache
+In the current project structure, this cache path is implementation groundwork rather than a standard generation toggle like DeepCache.
+## DeepCache Configuration
+### Parameters
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `cache_interval` | int | 3 | Steps between cache updates (higher = faster, lower quality) |
+| `cache_depth` | int | 2 | UNet depth for caching (0-12, higher = more aggressive) |
+| `start_step` | int | 0 | Timestep to start caching (0-1000) |
+| `end_step` | int | 1000 | Timestep to stop caching (0-1000) |
+### Streamlit UI
+Enable in the **⚡ DeepCache Acceleration** expander:
+1. Check **Enable DeepCache**
+2. Adjust sliders:
+   - **Cache Interval**: 1-10 (default: 3)
+   - **Cache Depth**: 0-12 (default: 2)
+   - **Start/End Steps**: 0-1000 (default: 0/1000)
+3. Generate images — caching applies transparently
+### REST API
+```bash
+curl -X POST http://localhost:7861/api/generate \
+  -H "Content-Type: application/json" \
+  -d '{
+        "prompt": "a misty forest at twilight",
+        "width": 768,
+        "height": 512,
+        "deepcache_enabled": true,
+        "deepcache_interval": 3,
+        "deepcache_depth": 2
+      }'
+```
+### Recommended Presets
+#### Balanced (Default)
+```yaml
+cache_interval: 3
+cache_depth: 2
+start_step: 0
+end_step: 1000
+```
+- **Speedup:** 2-2.3x
+- **Quality loss:** Very slight (1-2%)
+- **Use case:** Everyday generation
+#### Maximum Speed
+```yaml
+cache_interval: 5
+cache_depth: 3
+start_step: 0
+end_step: 1000
+```
+- **Speedup:** 2.5-3x
+- **Quality loss:** Noticeable (5-7%)
+- **Use case:** Rapid prototyping, batch jobs
+#### Maximum Quality
+```yaml
+cache_interval: 2
+cache_depth: 1
+start_step: 0
+end_step: 1000
+```
+- **Speedup:** 1.5-2x
+- **Quality loss:** Minimal (<1%)
+- **Use case:** Final renders, client work
+#### Partial Caching (Critical Steps Only)
+```yaml
+cache_interval: 3
+cache_depth: 2
+start_step: 200
+end_step: 800
+```
+- **Speedup:** 1.8-2.2x
+- **Quality loss:** Minimal
+- **Use case:** Preserve early structure, late details
+## First Block Cache (FBCache) Configuration
+### Parameters
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `residual_diff_threshold` | float | 0.05 | Max feature difference to trigger cache reuse (0.0-1.0) |
+### Usage
+First Block Cache is not currently exposed as a standard per-generation toggle. The implementation is available in the codebase for specialized integration work:
+```python
+# In src/user/pipeline.py
+from src.WaveSpeed import fbcache_nodes
+# Create cache context
+cache_context = fbcache_nodes.create_cache_context()
+# Apply caching to a Flux-style model
+with fbcache_nodes.cache_context(cache_context):
+    patched_model = fbcache_nodes.create_patch_flux_forward_orig(
+        flux_model,
+        residual_diff_threshold=0.05,  # Lower = stricter caching
+    )
+    # Generate images...
+```
+### Tuning Threshold
+- **Lower threshold (0.01-0.03)**: Stricter caching, recomputes more often, higher quality
+- **Higher threshold (0.05-0.1)**: Looser caching, reuses more often, higher speedup
+- **Recommended:** 0.05 (balances quality and speed)
+## Performance
+### Speedup Guidance
+Speedup scales with cache interval and depth:
+| Model | Cache Interval | Expected Behavior |
+|-------|---------------|-------------------|
+| SD1.5 | 2 | Moderate speedup, minimal quality loss |
+| SD1.5 | 3 | Good speedup, slight quality loss |
+| SD1.5 | 5 | High speedup, noticeable quality loss |
+| SDXL | 3 | Good speedup, slight quality loss |
+| Flux-style caching paths | implementation-specific | Depends on the integration path |
+**Performance varies based on:**
+- GPU architecture
+- Model size
+- Resolution
+- Sampler choice
+- Number of steps
+**Recommendation:** Start with `interval=3` and adjust based on your quality requirements.### VRAM Impact
+Caching increases VRAM usage slightly (50-200MB depending on resolution):
+| Model | Baseline VRAM | + DeepCache | Increase |
+|-------|--------------|-------------|----------|
+| SD1.5 (768×512) | 3.2 GB | 3.4 GB | +200 MB |
+| SDXL (1024×1024) | 6.8 GB | 7.0 GB | +200 MB |
+| Flux (832×1216) | 12.5 GB | 12.6 GB | +100 MB |
+## Stacking with Other Optimizations
+WaveSpeed is **fully compatible** with SageAttention, SpargeAttn and Stable-Fast:
+### DeepCache + SageAttention
+```yaml
+deepcache_enabled: true
+deepcache_interval: 3
+# SageAttention auto-detected
+```
+**Result:** 2.2x (DeepCache) × 1.15 (SageAttention) = **~2.5x total speedup**
+### DeepCache + SpargeAttn
+```yaml
+deepcache_enabled: true
+deepcache_interval: 3
+# SpargeAttn auto-detected
+```
+**Result:** Enhanced speedup from caching and sparse attention
+### DeepCache + Stable-Fast + SpargeAttn
+```yaml
+stable_fast: true
+deepcache_enabled: true
+deepcache_interval: 3
+# SpargeAttn auto-detected
+```
+**Result:** Maximum combined speedup (all optimizations active, batch operations only)
+## Compatibility
+### DeepCache Compatible With
+- ✅ Stable Diffusion 1.5
+- ✅ Stable Diffusion 2.1
+- ✅ SDXL
+- ✅ All samplers (Euler, DPM++, etc.)
+- ✅ LoRA adapters
+- ✅ Textual inversion embeddings
+- ✅ HiresFix
+- ✅ ADetailer
+- ✅ Multi-scale diffusion
+- ✅ SageAttention/SpargeAttn
+- ✅ Stable-Fast
+### DeepCache NOT Compatible With
+- ❌ Flux models (use FBCache instead)
+- ❌ Img2Img mode (can cause artifacts)
+### FBCache Compatible With
+- ✅ Flux models
+- ✅ SageAttention/SpargeAttn
+- ✅ All Flux-compatible features
+### FBCache NOT Compatible With
+- ❌ SD1.5/SDXL (use DeepCache instead)
+- ❌ Stable-Fast (Flux not supported by Stable-Fast)
+## Troubleshooting
+### No Speedup Observed
+**Causes:**
+1. DeepCache disabled or not applied to correct model type
+2. Cache interval too low (interval=1 provides no caching)
+3. Model loaded incorrectly
+**Fixes:**
+```bash
+# Check logs for DeepCache activation
+cat logs/server.log | grep -i "deepcache\|cache"
+# Verify UI toggle is enabled
+# Streamlit: Check "Enable DeepCache" checkbox
+# API: Ensure "deepcache_enabled": true in payload
+# Try higher interval
+deepcache_interval: 3  # Instead of 1 or 2
+```
+### Quality Degradation
+**Symptoms:**
+- Blurry details
+- Smoothed textures
+- Loss of fine patterns
+**Causes:**
+1. Cache interval too high
+2. Cache depth too aggressive
+3. Wrong model type (Flux using DeepCache)
+**Fixes:**
+```yaml
+# Reduce cache interval
+deepcache_interval: 2  # Down from 5
+# Reduce cache depth
+deepcache_depth: 1  # Down from 3
+# Disable caching for critical phases
+deepcache_start_step: 200  # Skip early structure formation
+deepcache_end_step: 800    # Skip late detail refinement
+```
+### Artifacts in Img2Img
+**Symptom:** Visible seams, inconsistent styles when using DeepCache with Img2Img.
+**Cause:** Img2Img starts from a noisy input image, which violates DeepCache's assumptions about feature consistency.
+**Fix:** Disable DeepCache for Img2Img:
+```yaml
+deepcache_enabled: false  # When img2img_enabled: true
+```
+### VRAM Increase
+**Symptom:** OOM errors after enabling DeepCache.
+**Cause:** Cached features consume additional VRAM.
+**Fixes:**
+1. Reduce batch size
+2. Lower resolution
+3. Disable other VRAM-heavy features (Stable-Fast CUDA graphs)
+4. Use lower cache depth:
+   ```yaml
+   deepcache_depth: 1  # Minimal caching
+   ```
+### Flux FBCache Not Working
+**Symptom:** No speedup with Flux generation.
+**Cause:** FBCache implementation is more subtle — check logs for cache hit rate.
+**Debugging:**
+```bash
+# Enable debug logging
+export LD_SERVER_LOGLEVEL=DEBUG
+# Check cache statistics
+cat logs/server.log | grep "cache"
+```
+If no cache hits, try adjusting threshold:
+```python
+# In pipeline.py
+residual_diff_threshold=0.1  # Increase from 0.05 for more cache reuse
+```
+## Quality Comparison
+Visual impact of different cache intervals:
+| Interval | Speed | Visual Difference |
+|----------|-------|-------------------|
+| Disabled | Baseline | Baseline (100% quality) |
+| 2 | Faster | Virtually identical |
+| 3 | Much faster | Very subtle smoothing |
+| 5 | Very fast | Noticeable detail loss |
+| 7+ | Fastest | Obvious quality degradation |
+**Recommendation:** Start with `interval=3` and adjust based on visual results.
+## Technical Details
+### DeepCache Implementation
+Simplified pseudocode:
+```python
+class DeepCacheWrapper:
+    def __init__(self, model, interval, depth):
+        self.model = model
+        self.interval = interval
+        self.cached_output = None
+        self.current_step = 0
+    def forward(self, x, timestep):
+        is_cache_step = (self.current_step % self.interval == 0)
+        if is_cache_step:
+            # Run full model, cache output
+            output = self.model(x, timestep)
+            self.cached_output = output.clone()
+        else:
+            # Reuse cached output (skip expensive computation)
+            output = self.cached_output
+        self.current_step += 1
+        return output
+```
+Actual implementation in `src/WaveSpeed/deepcache_nodes.py` includes:
+- Proper timestep tracking
+- Cache invalidation on batch changes
+- Error handling and fallback to full forward
+### FBCache Residual Comparison
+```python
+# Compute first block output
+first_output = first_transformer_block(hidden_states)
+# Compare to previous step
+residual = first_output - previous_first_output
+residual_norm = residual.abs().mean() / first_output.abs().mean()
+if residual_norm < threshold:
+    # Feature change is small — reuse cached blocks
+    hidden_states = apply_cached_residual(first_output)
+else:
+    # Feature change is large — recompute all blocks
+    hidden_states = run_remaining_blocks(first_output)
+    cache_residual(hidden_states)
+```
+## Best Practices
+### For Everyday Use
+1. **Enable DeepCache** with default settings (`interval=3`, `depth=2`)
+2. **Stack with SageAttention** for 2.5x+ total speedup
+3. **Disable for final client renders** if absolute quality is critical
+### For Batch Processing
+1. **Use aggressive caching** (`interval=5`, `depth=3`)
+2. **Pre-generate previews** at high speed, re-render winners at full quality
+3. **Disable TAESD previews** to avoid overhead (set `enable_preview=false`)
+### For Low VRAM
+1. **Use conservative caching** (`interval=2`, `depth=1`)
+2. **Avoid stacking** with Stable-Fast CUDA graphs
+3. **Monitor VRAM** via `/api/telemetry` endpoint
+## Citation
+If you use WaveSpeed/DeepCache in your work:
+```bibtex
+@inproceedings{ma2023deepcache,
+  title={DeepCache: Accelerating Diffusion Models for Free},
+  author={Ma, Xinyin and Fang, Gongfan and Wang, Xinchao},
+  booktitle={CVPR},
+  year={2024}
+}
+```
+## Resources
+- [DeepCache Paper](https://arxiv.org/abs/2312.00858)
+- [DeepCache Repository](https://github.com/horseee/DeepCache)
+- [ComfyUI DeepCache Implementation](https://gist.github.com/laksjdjf/435c512bc19636e9c9af4ee7bea9eb86) (reference for LightDiffusion-Next)
+- [First Block Cache Discussion](https://github.com/comfyanonymous/ComfyUI/discussions/3491)

download_flux.py ADDED Viewed

	@@ -0,0 +1,21 @@

+import os
+import sys
+from pathlib import Path
+# Add project root to path
+project_root = Path(__file__).resolve().parent
+sys.path.insert(0, str(project_root))
+try:
+    from src.FileManaging import Downloader
+    print("Initializing Flux2 Klein download...")
+    Downloader.CheckAndDownloadFlux2()
+    print("\nDownload process finished.")
+    print("Models should be located in:")
+    print("  - ./include/diffusion_model/ (Diffusion Model)")
+    print("  - ./include/text_encoder/ (Text Encoder)")
+    print("  - ./include/vae/ (VAE)")
+except ImportError as e:
+    print(f"Error: Could not import Downloader. Make sure you are running this from the project root. {e}")
+except Exception as e:
+    print(f"An unexpected error occurred: {e}")

frontend/README.md ADDED Viewed

	@@ -0,0 +1,73 @@

+# React + TypeScript + Vite
+This template provides a minimal setup to get React working in Vite with HMR and some ESLint rules.
+Currently, two official plugins are available:
+- [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react) uses [Babel](https://babeljs.io/) (or [oxc](https://oxc.rs) when used in [rolldown-vite](https://vite.dev/guide/rolldown)) for Fast Refresh
+- [@vitejs/plugin-react-swc](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react-swc) uses [SWC](https://swc.rs/) for Fast Refresh
+## React Compiler
+The React Compiler is not enabled on this template because of its impact on dev & build performances. To add it, see [this documentation](https://react.dev/learn/react-compiler/installation).
+## Expanding the ESLint configuration
+If you are developing a production application, we recommend updating the configuration to enable type-aware lint rules:
+```js
+export default defineConfig([
+  globalIgnores(['dist']),
+  {
+    files: ['**/*.{ts,tsx}'],
+    extends: [
+      // Other configs...
+      // Remove tseslint.configs.recommended and replace with this
+      tseslint.configs.recommendedTypeChecked,
+      // Alternatively, use this for stricter rules
+      tseslint.configs.strictTypeChecked,
+      // Optionally, add this for stylistic rules
+      tseslint.configs.stylisticTypeChecked,
+      // Other configs...
+    ],
+    languageOptions: {
+      parserOptions: {
+        project: ['./tsconfig.node.json', './tsconfig.app.json'],
+        tsconfigRootDir: import.meta.dirname,
+      },
+      // other options...
+    },
+  },
+])
+```
+You can also install [eslint-plugin-react-x](https://github.com/Rel1cx/eslint-react/tree/main/packages/plugins/eslint-plugin-react-x) and [eslint-plugin-react-dom](https://github.com/Rel1cx/eslint-react/tree/main/packages/plugins/eslint-plugin-react-dom) for React-specific lint rules:
+```js
+// eslint.config.js
+import reactX from 'eslint-plugin-react-x'
+import reactDom from 'eslint-plugin-react-dom'
+export default defineConfig([
+  globalIgnores(['dist']),
+  {
+    files: ['**/*.{ts,tsx}'],
+    extends: [
+      // Other configs...
+      // Enable lint rules for React
+      reactX.configs['recommended-typescript'],
+      // Enable lint rules for React DOM
+      reactDom.configs.recommended,
+    ],
+    languageOptions: {
+      parserOptions: {
+        project: ['./tsconfig.node.json', './tsconfig.app.json'],
+        tsconfigRootDir: import.meta.dirname,
+      },
+      // other options...
+    },
+  },
+])
+```

frontend/dist/assets/index-7kNA4Hm-.js ADDED Viewed

The diff for this file is too large to render. See raw diff

frontend/dist/assets/index-CAwyaxYh.css ADDED Viewed

	@@ -0,0 +1 @@

+ @import"https://fonts.googleapis.com/css2?family=Fraunces:opsz,wght@9..144,500;9..144,600;9..144,700&family=Instrument+Sans:wght@400;500;600;700&display=swap";@layer components;@layer properties{@supports (((-webkit-hyphens:none)) and (not (margin-trim:inline))) or ((-moz-orient:inline) and (not (color:rgb(from red r g b)))){*,:before,:after,::backdrop{--tw-scale-x:1;--tw-scale-y:1;--tw-scale-z:1;--tw-rotate-x:initial;--tw-rotate-y:initial;--tw-rotate-z:initial;--tw-skew-x:initial;--tw-skew-y:initial;--tw-pan-x:initial;--tw-pan-y:initial;--tw-pinch-zoom:initial;--tw-space-y-reverse:0;--tw-space-x-reverse:0;--tw-divide-x-reverse:0;--tw-border-style:solid;--tw-divide-y-reverse:0;--tw-leading:initial;--tw-font-weight:initial;--tw-tracking:initial;--tw-ordinal:initial;--tw-slashed-zero:initial;--tw-numeric-figure:initial;--tw-numeric-spacing:initial;--tw-numeric-fraction:initial;--tw-shadow:0 0 #0000;--tw-shadow-color:initial;--tw-shadow-alpha:100%;--tw-inset-shadow:0 0 #0000;--tw-inset-shadow-color:initial;--tw-inset-shadow-alpha:100%;--tw-ring-color:initial;--tw-ring-shadow:0 0 #0000;--tw-inset-ring-color:initial;--tw-inset-ring-shadow:0 0 #0000;--tw-ring-inset:initial;--tw-ring-offset-width:0px;--tw-ring-offset-color:#fff;--tw-ring-offset-shadow:0 0 #0000;--tw-outline-style:solid;--tw-blur:initial;--tw-brightness:initial;--tw-contrast:initial;--tw-grayscale:initial;--tw-hue-rotate:initial;--tw-invert:initial;--tw-opacity:initial;--tw-saturate:initial;--tw-sepia:initial;--tw-drop-shadow:initial;--tw-drop-shadow-color:initial;--tw-drop-shadow-alpha:100%;--tw-drop-shadow-size:initial;--tw-backdrop-blur:initial;--tw-backdrop-brightness:initial;--tw-backdrop-contrast:initial;--tw-backdrop-grayscale:initial;--tw-backdrop-hue-rotate:initial;--tw-backdrop-invert:initial;--tw-backdrop-opacity:initial;--tw-backdrop-saturate:initial;--tw-backdrop-sepia:initial;--tw-duration:initial;--tw-ease:initial;--tw-translate-x:0;--tw-translate-y:0;--tw-translate-z:0}}}@layer theme{:root,:host{--font-sans:"Instrument Sans", ui-sans-serif, sans-serif;--font-serif:"Fraunces", ui-serif, serif;--font-mono:ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace;--spacing:.25rem;--container-sm:24rem;--container-lg:32rem;--container-3xl:48rem;--container-4xl:56rem;--text-xs:.75rem;--text-xs--line-height:calc(1 / .75);--text-sm:.875rem;--text-sm--line-height:calc(1.25 / .875);--text-lg:1.125rem;--text-lg--line-height:calc(1.75 / 1.125);--font-weight-medium:500;--font-weight-semibold:600;--leading-tight:1.25;--radius-2xl:1rem;--radius-3xl:1.5rem;--ease-out:cubic-bezier(0, 0, .2, 1);--animate-spin:spin 1s linear infinite;--default-transition-duration:.15s;--default-transition-timing-function:cubic-bezier(.4, 0, .2, 1);--default-font-family:var(--font-sans);--default-mono-font-family:var(--font-mono);--color-canvas:oklch(97.8% .012 78);--color-paper:oklch(99.2% .008 82);--color-oat:oklch(96.7% .018 79);--color-sand:oklch(94% .018 76);--color-stone:oklch(83% .016 73);--color-line:oklch(88% .012 76);--color-ink:oklch(25.5% .02 55);--color-muted:oklch(54% .015 67);--color-clay:oklch(64% .15 41);--color-clay-strong:oklch(56% .16 39);--animate-accordion-down:accordion-down .22s cubic-bezier(.16, 1, .3, 1);--animate-accordion-up:accordion-up .18s cubic-bezier(.16, 1, .3, 1)}}@layer base{*,:after,:before,::backdrop{box-sizing:border-box;border:0 solid;margin:0;padding:0}::file-selector-button{box-sizing:border-box;border:0 solid;margin:0;padding:0}html,:host{-webkit-text-size-adjust:100%;tab-size:4;line-height:1.5;font-family:var(--default-font-family,ui-sans-serif, system-ui, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji");font-feature-settings:var(--default-font-feature-settings,normal);font-variation-settings:var(--default-font-variation-settings,normal);-webkit-tap-highlight-color:transparent}hr{height:0;color:inherit;border-top-width:1px}abbr:where([title]){-webkit-text-decoration:underline dotted;text-decoration:underline dotted}h1,h2,h3,h4,h5,h6{font-size:inherit;font-weight:inherit}a{color:inherit;-webkit-text-decoration:inherit;text-decoration:inherit}b,strong{font-weight:bolder}code,kbd,samp,pre{font-family:var(--default-mono-font-family,ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace);font-feature-settings:var(--default-mono-font-feature-settings,normal);font-variation-settings:var(--default-mono-font-variation-settings,normal);font-size:1em}small{font-size:80%}sub,sup{vertical-align:baseline;font-size:75%;line-height:0;position:relative}sub{bottom:-.25em}sup{top:-.5em}table{text-indent:0;border-color:inherit;border-collapse:collapse}:-moz-focusring{outline:auto}progress{vertical-align:baseline}summary{display:list-item}ol,ul,menu{list-style:none}img,svg,video,canvas,audio,iframe,embed,object{vertical-align:middle;display:block}img,video{max-width:100%;height:auto}button,input,select,optgroup,textarea{font:inherit;font-feature-settings:inherit;font-variation-settings:inherit;letter-spacing:inherit;color:inherit;opacity:1;background-color:#0000;border-radius:0}::file-selector-button{font:inherit;font-feature-settings:inherit;font-variation-settings:inherit;letter-spacing:inherit;color:inherit;opacity:1;background-color:#0000;border-radius:0}:where(select:is([multiple],[size])) optgroup{font-weight:bolder}:where(select:is([multiple],[size])) optgroup option{padding-inline-start:20px}::file-selector-button{margin-inline-end:4px}::placeholder{opacity:1}@supports (not ((-webkit-appearance:-apple-pay-button))) or (contain-intrinsic-size:1px){::placeholder{color:currentColor}@supports (color:color-mix(in lab,red,red)){::placeholder{color:color-mix(in oklab,currentcolor 50%,transparent)}}}textarea{resize:vertical}::-webkit-search-decoration{-webkit-appearance:none}::-webkit-date-and-time-value{min-height:1lh;text-align:inherit}::-webkit-datetime-edit{display:inline-flex}::-webkit-datetime-edit-fields-wrapper{padding:0}::-webkit-datetime-edit{padding-block:0}::-webkit-datetime-edit-year-field{padding-block:0}::-webkit-datetime-edit-month-field{padding-block:0}::-webkit-datetime-edit-day-field{padding-block:0}::-webkit-datetime-edit-hour-field{padding-block:0}::-webkit-datetime-edit-minute-field{padding-block:0}::-webkit-datetime-edit-second-field{padding-block:0}::-webkit-datetime-edit-millisecond-field{padding-block:0}::-webkit-datetime-edit-meridiem-field{padding-block:0}::-webkit-calendar-picker-indicator{line-height:1}:-moz-ui-invalid{box-shadow:none}button,input:where([type=button],[type=reset],[type=submit]){appearance:button}::file-selector-button{appearance:button}::-webkit-inner-spin-button{height:auto}::-webkit-outer-spin-button{height:auto}[hidden]:where(:not([hidden=until-found])){display:none!important}:root{color:var(--color-ink);background:var(--color-canvas);font-synthesis:none;text-rendering:optimizelegibility;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}*{border-color:#dfdad3}@supports (color:color-mix(in lab,red,red)){*{border-color:color-mix(in oklab,var(--color-line) 92%,white)}}html,body,#root{min-height:100%}body{font-family:var(--font-sans);color:var(--color-ink);background:radial-gradient(circle at top,#fbf3e79e,#0000 31rem),linear-gradient(#fffcf7,#fdf8f0 23rem);margin:0}@supports (color:color-mix(in lab,red,red)){body{background:radial-gradient(circle at top,color-mix(in oklab,var(--color-oat) 62%,transparent),transparent 31rem),linear-gradient(180deg,color-mix(in oklab,var(--color-paper) 99%,white),color-mix(in oklab,var(--color-canvas) 92%,white) 23rem)}}button,input,textarea{font:inherit}img{max-width:100%;display:block}::selection{background:#f6ded4}@supports (color:color-mix(in lab,red,red)){::selection{background:color-mix(in oklab,var(--color-clay) 22%,white)}}}@layer utilities{.pointer-events-none{pointer-events:none}.collapse{visibility:collapse}.invisible{visibility:hidden}.visible{visibility:visible}.sr-only{clip-path:inset(50%);white-space:nowrap;border-width:0;width:1px;height:1px;margin:-1px;padding:0;position:absolute;overflow:hidden}.not-sr-only{clip-path:none;white-space:normal;width:auto;height:auto;margin:0;padding:0;position:static;overflow:visible}.absolute{position:absolute}.fixed{position:fixed}.relative{position:relative}.static{position:static}.sticky{position:sticky}.inset-0{inset:calc(var(--spacing) * 0)}.inset-x-0{inset-inline:calc(var(--spacing) * 0)}.inset-x-3{inset-inline:calc(var(--spacing) * 3)}.inset-x-4{inset-inline:calc(var(--spacing) * 4)}.inset-y-3{inset-block:calc(var(--spacing) * 3)}.start{inset-inline-start:var(--spacing)}.end{inset-inline-end:var(--spacing)}.top-0{top:calc(var(--spacing) * 0)}.top-3{top:calc(var(--spacing) * 3)}.top-4{top:calc(var(--spacing) * 4)}.top-5{top:calc(var(--spacing) * 5)}.right-3{right:calc(var(--spacing) * 3)}.right-5{right:calc(var(--spacing) * 5)}.bottom-3{bottom:calc(var(--spacing) * 3)}.bottom-4{bottom:calc(var(--spacing) * 4)}.left-3{left:calc(var(--spacing) * 3)}.isolate{isolation:isolate}.isolation-auto{isolation:auto}.z-10{z-index:10}.z-50{z-index:50}.container{width:100%}@media(min-width:40rem){.container{max-width:40rem}}@media(min-width:48rem){.container{max-width:48rem}}@media(min-width:64rem){.container{max-width:64rem}}@media(min-width:80rem){.container{max-width:80rem}}@media(min-width:96rem){.container{max-width:96rem}}.-mx-1{margin-inline:calc(var(--spacing) * -1)}.mx-auto{margin-inline:auto}.my-1{margin-block:calc(var(--spacing) * 1)}.-mt-2{margin-top:calc(var(--spacing) * -2)}.mt-1{margin-top:calc(var(--spacing) * 1)}.mt-2\.5{margin-top:calc(var(--spacing) * 2.5)}.mt-3{margin-top:calc(var(--spacing) * 3)}.mt-4{margin-top:calc(var(--spacing) * 4)}.mb-2{margin-bottom:calc(var(--spacing) * 2)}.block{display:block}.contents{display:contents}.flex{display:flex}.flow-root{display:flow-root}.grid{display:grid}.hidden{display:none}.inline{display:inline}.inline-block{display:inline-block}.inline-flex{display:inline-flex}.inline-grid{display:inline-grid}.inline-table{display:inline-table}.list-item{display:list-item}.table{display:table}.table-caption{display:table-caption}.table-cell{display:table-cell}.table-column{display:table-column}.table-column-group{display:table-column-group}.table-footer-group{display:table-footer-group}.table-header-group{display:table-header-group}.table-row{display:table-row}.table-row-group{display:table-row-group}.h-1\.5{height:calc(var(--spacing) * 1.5)}.h-2\.5{height:calc(var(--spacing) * 2.5)}.h-3\.5{height:calc(var(--spacing) * 3.5)}.h-4{height:calc(var(--spacing) * 4)}.h-5{height:calc(var(--spacing) * 5)}.h-6{height:calc(var(--spacing) * 6)}.h-9{height:calc(var(--spacing) * 9)}.h-10{height:calc(var(--spacing) * 10)}.h-11{height:calc(var(--spacing) * 11)}.h-12{height:calc(var(--spacing) * 12)}.h-14{height:calc(var(--spacing) * 14)}.h-16{height:calc(var(--spacing) * 16)}.h-28{height:calc(var(--spacing) * 28)}.h-40{height:calc(var(--spacing) * 40)}.h-52{height:calc(var(--spacing) * 52)}.h-96{height:calc(var(--spacing) * 96)}.h-\[4\.25rem\]{height:4.25rem}.h-\[calc\(100\%-4rem\)\]{height:calc(100% - 4rem)}.h-\[calc\(100vh-2rem\)\]{height:calc(100vh - 2rem)}.h-\[min\(88vh\,860px\)\]{height:min(88vh,860px)}.h-\[var\(--radix-select-trigger-height\)\]{height:var(--radix-select-trigger-height)}.h-auto{height:auto}.h-full{height:100%}.h-px{height:1px}.max-h-80{max-height:calc(var(--spacing) * 80)}.max-h-\[calc\(100vh-10rem\)\]{max-height:calc(100vh - 10rem)}.min-h-0{min-height:calc(var(--spacing) * 0)}.min-h-36{min-height:calc(var(--spacing) * 36)}.min-h-40{min-height:calc(var(--spacing) * 40)}.min-h-\[108px\]{min-height:108px}.min-h-\[124px\]{min-height:124px}.min-h-\[172px\]{min-height:172px}.min-h-\[460px\]{min-height:460px}.min-h-screen{min-height:100vh}.w-2\.5{width:calc(var(--spacing) * 2.5)}.w-3\.5{width:calc(var(--spacing) * 3.5)}.w-3\/4{width:75%}.w-4{width:calc(var(--spacing) * 4)}.w-5{width:calc(var(--spacing) * 5)}.w-6{width:calc(var(--spacing) * 6)}.w-9{width:calc(var(--spacing) * 9)}.w-10{width:calc(var(--spacing) * 10)}.w-11{width:calc(var(--spacing) * 11)}.w-12{width:calc(var(--spacing) * 12)}.w-16{width:calc(var(--spacing) * 16)}.w-\[4\.25rem\]{width:4.25rem}.w-\[26rem\]{width:26rem}.w-auto{width:auto}.w-full{width:100%}.w-px{width:1px}.max-w-3xl{max-width:var(--container-3xl)}.max-w-4xl{max-width:var(--container-4xl)}.max-w-\[1200px\]{max-width:1200px}.max-w-\[1320px\]{max-width:1320px}.max-w-full{max-width:100%}.max-w-lg{max-width:var(--container-lg)}.min-w-\[8rem\]{min-width:8rem}.min-w-\[var\(--radix-select-trigger-width\)\]{min-width:var(--radix-select-trigger-width)}.flex-1{flex:1}.shrink{flex-shrink:1}.shrink-0{flex-shrink:0}.grow{flex-grow:1}.border-collapse{border-collapse:collapse}.translate-none{translate:none}.scale-3d{scale:var(--tw-scale-x) var(--tw-scale-y) var(--tw-scale-z)}.transform{transform:var(--tw-rotate-x,) var(--tw-rotate-y,) var(--tw-rotate-z,) var(--tw-skew-x,) var(--tw-skew-y,)}.animate-spin{animation:var(--animate-spin)}.cursor-default{cursor:default}.cursor-pointer{cursor:pointer}.touch-pinch-zoom{--tw-pinch-zoom:pinch-zoom;touch-action:var(--tw-pan-x,) var(--tw-pan-y,) var(--tw-pinch-zoom,)}.touch-none{touch-action:none}.resize{resize:both}.flex-col{flex-direction:column}.flex-col-reverse{flex-direction:column-reverse}.flex-wrap{flex-wrap:wrap}.items-center{align-items:center}.justify-between{justify-content:space-between}.justify-center{justify-content:center}.justify-end{justify-content:flex-end}.gap-1\.5{gap:calc(var(--spacing) * 1.5)}.gap-2{gap:calc(var(--spacing) * 2)}.gap-3{gap:calc(var(--spacing) * 3)}.gap-4{gap:calc(var(--spacing) * 4)}.gap-5{gap:calc(var(--spacing) * 5)}.gap-6{gap:calc(var(--spacing) * 6)}:where(.space-y-1>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 1) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 1) * calc(1 - var(--tw-space-y-reverse)))}:where(.space-y-1\.5>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 1.5) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 1.5) * calc(1 - var(--tw-space-y-reverse)))}:where(.space-y-2>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 2) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 2) * calc(1 - var(--tw-space-y-reverse)))}:where(.space-y-2\.5>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 2.5) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 2.5) * calc(1 - var(--tw-space-y-reverse)))}:where(.space-y-3>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 3) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 3) * calc(1 - var(--tw-space-y-reverse)))}:where(.space-y-4>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 4) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 4) * calc(1 - var(--tw-space-y-reverse)))}:where(.space-y-5>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 5) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 5) * calc(1 - var(--tw-space-y-reverse)))}:where(.space-y-reverse>:not(:last-child)){--tw-space-y-reverse:1}:where(.space-x-reverse>:not(:last-child)){--tw-space-x-reverse:1}:where(.divide-x>:not(:last-child)){--tw-divide-x-reverse:0;border-inline-style:var(--tw-border-style);border-inline-start-width:calc(1px * var(--tw-divide-x-reverse));border-inline-end-width:calc(1px * calc(1 - var(--tw-divide-x-reverse)))}:where(.divide-y>:not(:last-child)){--tw-divide-y-reverse:0;border-bottom-style:var(--tw-border-style);border-top-style:var(--tw-border-style);border-top-width:calc(1px * var(--tw-divide-y-reverse));border-bottom-width:calc(1px * calc(1 - var(--tw-divide-y-reverse)))}:where(.divide-y-reverse>:not(:last-child)){--tw-divide-y-reverse:1}.self-end{align-self:flex-end}.truncate{text-overflow:ellipsis;white-space:nowrap;overflow:hidden}.overflow-hidden{overflow:hidden}.rounded-2xl{border-radius:var(--radius-2xl)}.rounded-3xl{border-radius:var(--radius-3xl)}.rounded-\[1\.2rem\]{border-radius:1.2rem}.rounded-\[1\.4rem\]{border-radius:1.4rem}.rounded-\[1\.5rem\]{border-radius:1.5rem}.rounded-\[1\.7rem\]{border-radius:1.7rem}.rounded-\[1\.9rem\]{border-radius:1.9rem}.rounded-\[1\.15rem\]{border-radius:1.15rem}.rounded-\[1\.35rem\]{border-radius:1.35rem}.rounded-\[1\.75rem\]{border-radius:1.75rem}.rounded-\[1rem\]{border-radius:1rem}.rounded-\[2\.1rem\]{border-radius:2.1rem}.rounded-\[2rem\]{border-radius:2rem}.rounded-\[inherit\]{border-radius:inherit}.rounded-full{border-radius:3.40282e38px}.rounded-s{border-start-start-radius:.25rem;border-end-start-radius:.25rem}.rounded-ss{border-start-start-radius:.25rem}.rounded-e{border-start-end-radius:.25rem;border-end-end-radius:.25rem}.rounded-se{border-start-end-radius:.25rem}.rounded-ee{border-end-end-radius:.25rem}.rounded-es{border-end-start-radius:.25rem}.rounded-t{border-top-left-radius:.25rem;border-top-right-radius:.25rem}.rounded-t-\[2\.25rem\]{border-top-left-radius:2.25rem;border-top-right-radius:2.25rem}.rounded-l{border-top-left-radius:.25rem;border-bottom-left-radius:.25rem}.rounded-tl{border-top-left-radius:.25rem}.rounded-r{border-top-right-radius:.25rem;border-bottom-right-radius:.25rem}.rounded-tr{border-top-right-radius:.25rem}.rounded-b{border-bottom-right-radius:.25rem;border-bottom-left-radius:.25rem}.rounded-b-\[2rem\]{border-bottom-right-radius:2rem;border-bottom-left-radius:2rem}.rounded-br{border-bottom-right-radius:.25rem}.rounded-bl{border-bottom-left-radius:.25rem}.border{border-style:var(--tw-border-style);border-width:1px}.border-x{border-inline-style:var(--tw-border-style);border-inline-width:1px}.border-y{border-block-style:var(--tw-border-style);border-block-width:1px}.border-s{border-inline-start-style:var(--tw-border-style);border-inline-start-width:1px}.border-e{border-inline-end-style:var(--tw-border-style);border-inline-end-width:1px}.border-bs{border-block-start-style:var(--tw-border-style);border-block-start-width:1px}.border-be{border-block-end-style:var(--tw-border-style);border-block-end-width:1px}.border-t{border-top-style:var(--tw-border-style);border-top-width:1px}.border-t-0{border-top-style:var(--tw-border-style);border-top-width:0}.border-r{border-right-style:var(--tw-border-style);border-right-width:1px}.border-b{border-bottom-style:var(--tw-border-style);border-bottom-width:1px}.border-b-0{border-bottom-style:var(--tw-border-style);border-bottom-width:0}.border-l{border-left-style:var(--tw-border-style);border-left-width:1px}.border-dashed{--tw-border-style:dashed;border-style:dashed}.border-clay{border-color:var(--color-clay)}.border-clay-strong{border-color:var(--color-clay-strong)}.border-line{border-color:var(--color-line)}.border-line\/65{border-color:#dcd7cfa6}@supports (color:color-mix(in lab,red,red)){.border-line\/65{border-color:color-mix(in oklab,var(--color-line) 65%,transparent)}}.border-line\/70{border-color:#dcd7cfb3}@supports (color:color-mix(in lab,red,red)){.border-line\/70{border-color:color-mix(in oklab,var(--color-line) 70%,transparent)}}.border-line\/75{border-color:#dcd7cfbf}@supports (color:color-mix(in lab,red,red)){.border-line\/75{border-color:color-mix(in oklab,var(--color-line) 75%,transparent)}}.border-line\/80{border-color:#dcd7cfcc}@supports (color:color-mix(in lab,red,red)){.border-line\/80{border-color:color-mix(in oklab,var(--color-line) 80%,transparent)}}.border-transparent{border-color:#0000}.border-t-transparent{border-top-color:#0000}.border-l-transparent{border-left-color:#0000}.bg-canvas{background-color:var(--color-canvas)}.bg-canvas\/34{background-color:#fcf7ef57}@supports (color:color-mix(in lab,red,red)){.bg-canvas\/34{background-color:color-mix(in oklab,var(--color-canvas) 34%,transparent)}}.bg-canvas\/48{background-color:#fcf7ef7a}@supports (color:color-mix(in lab,red,red)){.bg-canvas\/48{background-color:color-mix(in oklab,var(--color-canvas) 48%,transparent)}}.bg-clay{background-color:var(--color-clay)}.bg-clay\/8{background-color:#d6683d14}@supports (color:color-mix(in lab,red,red)){.bg-clay\/8{background-color:color-mix(in oklab,var(--color-clay) 8%,transparent)}}.bg-clay\/10{background-color:#d6683d1a}@supports (color:color-mix(in lab,red,red)){.bg-clay\/10{background-color:color-mix(in oklab,var(--color-clay) 10%,transparent)}}.bg-ink{background-color:var(--color-ink)}.bg-ink\/14{background-color:#2b201a24}@supports (color:color-mix(in lab,red,red)){.bg-ink\/14{background-color:color-mix(in oklab,var(--color-ink) 14%,transparent)}}.bg-ink\/\[0\.04\]{background-color:#2b201a0a}@supports (color:color-mix(in lab,red,red)){.bg-ink\/\[0\.04\]{background-color:color-mix(in oklab,var(--color-ink) 4%,transparent)}}.bg-line{background-color:var(--color-line)}.bg-oat\/32{background-color:#fbf3e752}@supports (color:color-mix(in lab,red,red)){.bg-oat\/32{background-color:color-mix(in oklab,var(--color-oat) 32%,transparent)}}.bg-oat\/42{background-color:#fbf3e76b}@supports (color:color-mix(in lab,red,red)){.bg-oat\/42{background-color:color-mix(in oklab,var(--color-oat) 42%,transparent)}}.bg-oat\/45{background-color:#fbf3e773}@supports (color:color-mix(in lab,red,red)){.bg-oat\/45{background-color:color-mix(in oklab,var(--color-oat) 45%,transparent)}}.bg-oat\/55{background-color:#fbf3e78c}@supports (color:color-mix(in lab,red,red)){.bg-oat\/55{background-color:color-mix(in oklab,var(--color-oat) 55%,transparent)}}.bg-oat\/60{background-color:#fbf3e799}@supports (color:color-mix(in lab,red,red)){.bg-oat\/60{background-color:color-mix(in oklab,var(--color-oat) 60%,transparent)}}.bg-paper{background-color:var(--color-paper)}.bg-paper\/62{background-color:#fffcf79e}@supports (color:color-mix(in lab,red,red)){.bg-paper\/62{background-color:color-mix(in oklab,var(--color-paper) 62%,transparent)}}.bg-paper\/76{background-color:#fffcf7c2}@supports (color:color-mix(in lab,red,red)){.bg-paper\/76{background-color:color-mix(in oklab,var(--color-paper) 76%,transparent)}}.bg-paper\/90{background-color:#fffcf7e6}@supports (color:color-mix(in lab,red,red)){.bg-paper\/90{background-color:color-mix(in oklab,var(--color-paper) 90%,transparent)}}.bg-paper\/92{background-color:#fffcf7eb}@supports (color:color-mix(in lab,red,red)){.bg-paper\/92{background-color:color-mix(in oklab,var(--color-paper) 92%,transparent)}}.bg-paper\/94{background-color:#fffcf7f0}@supports (color:color-mix(in lab,red,red)){.bg-paper\/94{background-color:color-mix(in oklab,var(--color-paper) 94%,transparent)}}.bg-paper\/97{background-color:#fffcf7f7}@supports (color:color-mix(in lab,red,red)){.bg-paper\/97{background-color:color-mix(in oklab,var(--color-paper) 97%,transparent)}}.bg-paper\/98{background-color:#fffcf7fa}@supports (color:color-mix(in lab,red,red)){.bg-paper\/98{background-color:color-mix(in oklab,var(--color-paper) 98%,transparent)}}.bg-sand{background-color:var(--color-sand)}.bg-sand\/45{background-color:#f2eade73}@supports (color:color-mix(in lab,red,red)){.bg-sand\/45{background-color:color-mix(in oklab,var(--color-sand) 45%,transparent)}}.bg-stone\/60{background-color:#cec6bc99}@supports (color:color-mix(in lab,red,red)){.bg-stone\/60{background-color:color-mix(in oklab,var(--color-stone) 60%,transparent)}}.bg-\[radial-gradient\(circle_at_top_left\,color-mix\(in_oklab\,var\(--color-oat\)_86\%\,transparent\)\,transparent_66\%\)\]{background-image:radial-gradient(circle at 0 0,#fbf3e7db,#0000 66%)}@supports (color:color-mix(in lab,red,red)){.bg-\[radial-gradient\(circle_at_top_left\,color-mix\(in_oklab\,var\(--color-oat\)_86\%\,transparent\)\,transparent_66\%\)\]{background-image:radial-gradient(circle at top left,color-mix(in oklab,var(--color-oat) 86%,transparent),transparent 66%)}}.bg-repeat{background-repeat:repeat}.mask-no-clip{-webkit-mask-clip:no-clip;mask-clip:no-clip}.mask-repeat{-webkit-mask-repeat:repeat;mask-repeat:repeat}.object-contain{object-fit:contain}.object-cover{object-fit:cover}.p-1\.5{padding:calc(var(--spacing) * 1.5)}.p-2{padding:calc(var(--spacing) * 2)}.p-3{padding:calc(var(--spacing) * 3)}.p-4{padding:calc(var(--spacing) * 4)}.p-5{padding:calc(var(--spacing) * 5)}.p-6{padding:calc(var(--spacing) * 6)}.p-\[1px\]{padding:1px}.px-1{padding-inline:calc(var(--spacing) * 1)}.px-3{padding-inline:calc(var(--spacing) * 3)}.px-3\.5{padding-inline:calc(var(--spacing) * 3.5)}.px-4{padding-inline:calc(var(--spacing) * 4)}.px-5{padding-inline:calc(var(--spacing) * 5)}.px-6{padding-inline:calc(var(--spacing) * 6)}.px-8{padding-inline:calc(var(--spacing) * 8)}.py-1{padding-block:calc(var(--spacing) * 1)}.py-1\.5{padding-block:calc(var(--spacing) * 1.5)}.py-2{padding-block:calc(var(--spacing) * 2)}.py-2\.5{padding-block:calc(var(--spacing) * 2.5)}.py-3{padding-block:calc(var(--spacing) * 3)}.py-3\.5{padding-block:calc(var(--spacing) * 3.5)}.py-4{padding-block:calc(var(--spacing) * 4)}.py-5{padding-block:calc(var(--spacing) * 5)}.py-6{padding-block:calc(var(--spacing) * 6)}.pt-0\.5{padding-top:calc(var(--spacing) * .5)}.pt-2{padding-top:calc(var(--spacing) * 2)}.pt-3{padding-top:calc(var(--spacing) * 3)}.pt-4{padding-top:calc(var(--spacing) * 4)}.pr-1{padding-right:calc(var(--spacing) * 1)}.pr-3{padding-right:calc(var(--spacing) * 3)}.pr-10{padding-right:calc(var(--spacing) * 10)}.pb-2{padding-bottom:calc(var(--spacing) * 2)}.pb-3{padding-bottom:calc(var(--spacing) * 3)}.pb-4{padding-bottom:calc(var(--spacing) * 4)}.pb-7{padding-bottom:calc(var(--spacing) * 7)}.pb-10{padding-bottom:calc(var(--spacing) * 10)}.pl-8{padding-left:calc(var(--spacing) * 8)}.text-center{text-align:center}.text-left{text-align:left}.font-serif{font-family:var(--font-serif)}.text-lg{font-size:var(--text-lg);line-height:var(--tw-leading,var(--text-lg--line-height))}.text-sm{font-size:var(--text-sm);line-height:var(--tw-leading,var(--text-sm--line-height))}.text-xs{font-size:var(--text-xs);line-height:var(--tw-leading,var(--text-xs--line-height))}.text-\[1\.05rem\]{font-size:1.05rem}.text-\[1\.35rem\]{font-size:1.35rem}.text-\[11px\]{font-size:11px}.text-\[15px\]{font-size:15px}.text-\[16px\]{font-size:16px}.text-\[clamp\(1\.8rem\,3vw\,2\.5rem\)\]{font-size:clamp(1.8rem,3vw,2.5rem)}.text-\[clamp\(2\.75rem\,5\.2vw\,5rem\)\]{font-size:clamp(2.75rem,5.2vw,5rem)}.leading-5{--tw-leading:calc(var(--spacing) * 5);line-height:calc(var(--spacing) * 5)}.leading-6{--tw-leading:calc(var(--spacing) * 6);line-height:calc(var(--spacing) * 6)}.leading-7{--tw-leading:calc(var(--spacing) * 7);line-height:calc(var(--spacing) * 7)}.leading-\[0\.92\]{--tw-leading:.92;line-height:.92}.leading-tight{--tw-leading:var(--leading-tight);line-height:var(--leading-tight)}.font-medium{--tw-font-weight:var(--font-weight-medium);font-weight:var(--font-weight-medium)}.font-semibold{--tw-font-weight:var(--font-weight-semibold);font-weight:var(--font-weight-semibold)}.tracking-\[-0\.03em\]{--tw-tracking:-.03em;letter-spacing:-.03em}.tracking-\[-0\.025em\]{--tw-tracking:-.025em;letter-spacing:-.025em}.tracking-\[-0\.035em\]{--tw-tracking:-.035em;letter-spacing:-.035em}.tracking-\[-0\.055em\]{--tw-tracking:-.055em;letter-spacing:-.055em}.tracking-\[0\.16em\]{--tw-tracking:.16em;letter-spacing:.16em}.text-wrap{text-wrap:wrap}.text-clip{text-overflow:clip}.text-ellipsis{text-overflow:ellipsis}.whitespace-nowrap{white-space:nowrap}.text-clay{color:var(--color-clay)}.text-clay-strong{color:var(--color-clay-strong)}.text-ink{color:var(--color-ink)}.text-muted{color:var(--color-muted)}.text-paper{color:var(--color-paper)}.capitalize{text-transform:capitalize}.lowercase{text-transform:lowercase}.normal-case{text-transform:none}.uppercase{text-transform:uppercase}.italic{font-style:italic}.not-italic{font-style:normal}.diagonal-fractions{--tw-numeric-fraction:diagonal-fractions;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.lining-nums{--tw-numeric-figure:lining-nums;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.oldstyle-nums{--tw-numeric-figure:oldstyle-nums;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.ordinal{--tw-ordinal:ordinal;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.proportional-nums{--tw-numeric-spacing:proportional-nums;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.slashed-zero{--tw-slashed-zero:slashed-zero;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.stacked-fractions{--tw-numeric-fraction:stacked-fractions;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.tabular-nums{--tw-numeric-spacing:tabular-nums;font-variant-numeric:var(--tw-ordinal,) var(--tw-slashed-zero,) var(--tw-numeric-figure,) var(--tw-numeric-spacing,) var(--tw-numeric-fraction,)}.normal-nums{font-variant-numeric:normal}.line-through{text-decoration-line:line-through}.no-underline{text-decoration-line:none}.overline{text-decoration-line:overline}.underline{text-decoration-line:underline}.antialiased{-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}.subpixel-antialiased{-webkit-font-smoothing:auto;-moz-osx-font-smoothing:auto}.opacity-70{opacity:.7}.shadow{--tw-shadow:0 1px 3px 0 var(--tw-shadow-color,#0000001a), 0 1px 2px -1px var(--tw-shadow-color,#0000001a);box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_1px_2px_color-mix\(in_oklab\,var\(--color-ink\)_12\%\,transparent\)\]{--tw-shadow:0 1px 2px var(--tw-shadow-color,#2b201a1f)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_1px_2px_color-mix\(in_oklab\,var\(--color-ink\)_12\%\,transparent\)\]{--tw-shadow:0 1px 2px var(--tw-shadow-color,color-mix(in oklab,var(--color-ink) 12%,transparent))}}.shadow-\[0_1px_2px_color-mix\(in_oklab\,var\(--color-ink\)_12\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_10px_20px_-18px_color-mix\(in_oklab\,var\(--color-clay\)_28\%\,transparent\)\]{--tw-shadow:0 10px 20px -18px var(--tw-shadow-color,#d6683d47)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_10px_20px_-18px_color-mix\(in_oklab\,var\(--color-clay\)_28\%\,transparent\)\]{--tw-shadow:0 10px 20px -18px var(--tw-shadow-color,color-mix(in oklab,var(--color-clay) 28%,transparent))}}.shadow-\[0_10px_20px_-18px_color-mix\(in_oklab\,var\(--color-clay\)_28\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_10px_30px_-18px_color-mix\(in_oklab\,var\(--color-ink\)_48\%\,transparent\)\]{--tw-shadow:0 10px 30px -18px var(--tw-shadow-color,#2b201a7a)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_10px_30px_-18px_color-mix\(in_oklab\,var\(--color-ink\)_48\%\,transparent\)\]{--tw-shadow:0 10px 30px -18px var(--tw-shadow-color,color-mix(in oklab,var(--color-ink) 48%,transparent))}}.shadow-\[0_10px_30px_-18px_color-mix\(in_oklab\,var\(--color-ink\)_48\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_12px_30px_-22px_color-mix\(in_oklab\,var\(--color-clay\)_50\%\,transparent\)\]{--tw-shadow:0 12px 30px -22px var(--tw-shadow-color,#d6683d80)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_12px_30px_-22px_color-mix\(in_oklab\,var\(--color-clay\)_50\%\,transparent\)\]{--tw-shadow:0 12px 30px -22px var(--tw-shadow-color,color-mix(in oklab,var(--color-clay) 50%,transparent))}}.shadow-\[0_12px_30px_-22px_color-mix\(in_oklab\,var\(--color-clay\)_50\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_16px_30px_-24px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{--tw-shadow:0 16px 30px -24px var(--tw-shadow-color,#2b201a2e)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_16px_30px_-24px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{--tw-shadow:0 16px 30px -24px var(--tw-shadow-color,color-mix(in oklab,var(--color-ink) 18%,transparent))}}.shadow-\[0_16px_30px_-24px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_18px_42px_-36px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{--tw-shadow:0 18px 42px -36px var(--tw-shadow-color,#2b201a2e)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_18px_42px_-36px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{--tw-shadow:0 18px 42px -36px var(--tw-shadow-color,color-mix(in oklab,var(--color-ink) 18%,transparent))}}.shadow-\[0_18px_42px_-36px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_20px_48px_-36px_color-mix\(in_oklab\,var\(--color-ink\)_16\%\,transparent\)\]{--tw-shadow:0 20px 48px -36px var(--tw-shadow-color,#2b201a29)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_20px_48px_-36px_color-mix\(in_oklab\,var\(--color-ink\)_16\%\,transparent\)\]{--tw-shadow:0 20px 48px -36px var(--tw-shadow-color,color-mix(in oklab,var(--color-ink) 16%,transparent))}}.shadow-\[0_20px_48px_-36px_color-mix\(in_oklab\,var\(--color-ink\)_16\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_22px_46px_-28px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{--tw-shadow:0 22px 46px -28px var(--tw-shadow-color,#2b201a2e)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_22px_46px_-28px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{--tw-shadow:0 22px 46px -28px var(--tw-shadow-color,color-mix(in oklab,var(--color-ink) 18%,transparent))}}.shadow-\[0_22px_46px_-28px_color-mix\(in_oklab\,var\(--color-ink\)_18\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[0_28px_80px_-28px_color-mix\(in_oklab\,var\(--color-ink\)_20\%\,transparent\)\]{--tw-shadow:0 28px 80px -28px var(--tw-shadow-color,#2b201a33)}@supports (color:color-mix(in lab,red,red)){.shadow-\[0_28px_80px_-28px_color-mix\(in_oklab\,var\(--color-ink\)_20\%\,transparent\)\]{--tw-shadow:0 28px 80px -28px var(--tw-shadow-color,color-mix(in oklab,var(--color-ink) 20%,transparent))}}.shadow-\[0_28px_80px_-28px_color-mix\(in_oklab\,var\(--color-ink\)_20\%\,transparent\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-\[inset_0_1px_0_color-mix\(in_oklab\,var\(--color-paper\)_40\%\,white\)\]{--tw-shadow:inset 0 1px 0 var(--tw-shadow-color,#fffefc)}@supports (color:color-mix(in lab,red,red)){.shadow-\[inset_0_1px_0_color-mix\(in_oklab\,var\(--color-paper\)_40\%\,white\)\]{--tw-shadow:inset 0 1px 0 var(--tw-shadow-color,color-mix(in oklab,var(--color-paper) 40%,white))}}.shadow-\[inset_0_1px_0_color-mix\(in_oklab\,var\(--color-paper\)_40\%\,white\)\]{box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.shadow-none{--tw-shadow:0 0 #0000;box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.ring-0{--tw-ring-shadow:var(--tw-ring-inset,) 0 0 0 calc(0px + var(--tw-ring-offset-width)) var(--tw-ring-color,currentcolor);box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.inset-ring{--tw-inset-ring-shadow:inset 0 0 0 1px var(--tw-inset-ring-color,currentcolor);box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.outline{outline-style:var(--tw-outline-style);outline-width:1px}.blur{--tw-blur:blur(8px);filter:var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,)}.drop-shadow{--tw-drop-shadow-size:drop-shadow(0 1px 2px var(--tw-drop-shadow-color,#0000001a)) drop-shadow(0 1px 1px var(--tw-drop-shadow-color,#0000000f));--tw-drop-shadow:drop-shadow(0 1px 2px #0000001a) drop-shadow(0 1px 1px #0000000f);filter:var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,)}.filter{filter:var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,)}.filter\!{filter:var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,)!important}.backdrop-blur{--tw-backdrop-blur:blur(8px);-webkit-backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,);backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,)}.backdrop-blur-\[1\.5px\]{--tw-backdrop-blur:blur(1.5px);-webkit-backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,);backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,)}.backdrop-grayscale{--tw-backdrop-grayscale:grayscale(100%);-webkit-backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,);backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,)}.backdrop-invert{--tw-backdrop-invert:invert(100%);-webkit-backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,);backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,)}.backdrop-sepia{--tw-backdrop-sepia:sepia(100%);-webkit-backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,);backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,)}.backdrop-filter{-webkit-backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,);backdrop-filter:var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,)}.transition{transition-property:color,background-color,border-color,outline-color,text-decoration-color,fill,stroke,--tw-gradient-from,--tw-gradient-via,--tw-gradient-to,opacity,box-shadow,transform,translate,scale,rotate,filter,-webkit-backdrop-filter,backdrop-filter,display,content-visibility,overlay,pointer-events;transition-timing-function:var(--tw-ease,var(--default-transition-timing-function));transition-duration:var(--tw-duration,var(--default-transition-duration))}.transition-\[color\,background-color\,border-color\,box-shadow\,transform\]{transition-property:color,background-color,border-color,box-shadow,transform;transition-timing-function:var(--tw-ease,var(--default-transition-timing-function));transition-duration:var(--tw-duration,var(--default-transition-duration))}.transition-\[width\]{transition-property:width;transition-timing-function:var(--tw-ease,var(--default-transition-timing-function));transition-duration:var(--tw-duration,var(--default-transition-duration))}.transition-colors{transition-property:color,background-color,border-color,outline-color,text-decoration-color,fill,stroke,--tw-gradient-from,--tw-gradient-via,--tw-gradient-to;transition-timing-function:var(--tw-ease,var(--default-transition-timing-function));transition-duration:var(--tw-duration,var(--default-transition-duration))}.transition-transform{transition-property:transform,translate,scale,rotate;transition-timing-function:var(--tw-ease,var(--default-transition-timing-function));transition-duration:var(--tw-duration,var(--default-transition-duration))}.duration-200{--tw-duration:.2s;transition-duration:.2s}.duration-300{--tw-duration:.3s;transition-duration:.3s}.ease-out{--tw-ease:var(--ease-out);transition-timing-function:var(--ease-out)}.outline-none{--tw-outline-style:none;outline-style:none}.select-none{-webkit-user-select:none;user-select:none}:where(.divide-x-reverse>:not(:last-child)){--tw-divide-x-reverse:1}.ring-inset{--tw-ring-inset:inset}.group-data-\[state\=open\]\:rotate-180:is(:where(.group)[data-state=open] *){rotate:180deg}.placeholder\:text-muted::placeholder{color:var(--color-muted)}@media(hover:hover){.hover\:-translate-y-0\.5:hover{--tw-translate-y:calc(var(--spacing) * -.5);translate:var(--tw-translate-x) var(--tw-translate-y)}.hover\:border-clay\/35:hover{border-color:#d6683d59}@supports (color:color-mix(in lab,red,red)){.hover\:border-clay\/35:hover{border-color:color-mix(in oklab,var(--color-clay) 35%,transparent)}}.hover\:border-clay\/40:hover{border-color:#d6683d66}@supports (color:color-mix(in lab,red,red)){.hover\:border-clay\/40:hover{border-color:color-mix(in oklab,var(--color-clay) 40%,transparent)}}.hover\:border-clay\/45:hover{border-color:#d6683d73}@supports (color:color-mix(in lab,red,red)){.hover\:border-clay\/45:hover{border-color:color-mix(in oklab,var(--color-clay) 45%,transparent)}}.hover\:bg-clay-strong:hover{background-color:var(--color-clay-strong)}.hover\:bg-ink\/92:hover{background-color:#2b201aeb}@supports (color:color-mix(in lab,red,red)){.hover\:bg-ink\/92:hover{background-color:color-mix(in oklab,var(--color-ink) 92%,transparent)}}.hover\:bg-oat:hover{background-color:var(--color-oat)}.hover\:bg-oat\/75:hover{background-color:#fbf3e7bf}@supports (color:color-mix(in lab,red,red)){.hover\:bg-oat\/75:hover{background-color:color-mix(in oklab,var(--color-oat) 75%,transparent)}}.hover\:bg-paper:hover{background-color:var(--color-paper)}.hover\:bg-sand:hover{background-color:var(--color-sand)}.hover\:text-clay:hover{color:var(--color-clay)}.hover\:text-ink:hover{color:var(--color-ink)}}.focus\:ring-2:focus{--tw-ring-shadow:var(--tw-ring-inset,) 0 0 0 calc(2px + var(--tw-ring-offset-width)) var(--tw-ring-color,currentcolor);box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.focus\:ring-clay\/20:focus{--tw-ring-color:#d6683d33}@supports (color:color-mix(in lab,red,red)){.focus\:ring-clay\/20:focus{--tw-ring-color:color-mix(in oklab, var(--color-clay) 20%, transparent)}}.focus\:outline-none:focus{--tw-outline-style:none;outline-style:none}.focus-visible\:ring-2:focus-visible{--tw-ring-shadow:var(--tw-ring-inset,) 0 0 0 calc(2px + var(--tw-ring-offset-width)) var(--tw-ring-color,currentcolor);box-shadow:var(--tw-inset-shadow),var(--tw-inset-ring-shadow),var(--tw-ring-offset-shadow),var(--tw-ring-shadow),var(--tw-shadow)}.focus-visible\:ring-clay\/20:focus-visible{--tw-ring-color:#d6683d33}@supports (color:color-mix(in lab,red,red)){.focus-visible\:ring-clay\/20:focus-visible{--tw-ring-color:color-mix(in oklab, var(--color-clay) 20%, transparent)}}.focus-visible\:ring-clay\/25:focus-visible{--tw-ring-color:#d6683d40}@supports (color:color-mix(in lab,red,red)){.focus-visible\:ring-clay\/25:focus-visible{--tw-ring-color:color-mix(in oklab, var(--color-clay) 25%, transparent)}}.focus-visible\:outline-none:focus-visible{--tw-outline-style:none;outline-style:none}.disabled\:pointer-events-none:disabled{pointer-events:none}.disabled\:cursor-not-allowed:disabled{cursor:not-allowed}.disabled\:opacity-45:disabled{opacity:.45}.disabled\:opacity-50:disabled{opacity:.5}.data-\[disabled\]\:pointer-events-none[data-disabled]{pointer-events:none}.data-\[disabled\]\:opacity-40[data-disabled]{opacity:.4}.data-\[highlighted\]\:bg-sand[data-highlighted]{background-color:var(--color-sand)}.data-\[side\=bottom\]\:translate-y-1[data-side=bottom]{--tw-translate-y:calc(var(--spacing) * 1);translate:var(--tw-translate-x) var(--tw-translate-y)}.data-\[side\=top\]\:-translate-y-1[data-side=top]{--tw-translate-y:calc(var(--spacing) * -1);translate:var(--tw-translate-x) var(--tw-translate-y)}.data-\[state\=checked\]\:translate-x-5[data-state=checked]{--tw-translate-x:calc(var(--spacing) * 5);translate:var(--tw-translate-x) var(--tw-translate-y)}.data-\[state\=checked\]\:bg-clay[data-state=checked]{background-color:var(--color-clay)}.data-\[state\=closed\]\:animate-accordion-up[data-state=closed]{animation:var(--animate-accordion-up)}.data-\[state\=open\]\:animate-accordion-down[data-state=open]{animation:var(--animate-accordion-down)}.data-\[state\=unchecked\]\:translate-x-0[data-state=unchecked]{--tw-translate-x:calc(var(--spacing) * 0);translate:var(--tw-translate-x) var(--tw-translate-y)}@media(min-width:40rem){.sm\:inset-x-6{inset-inline:calc(var(--spacing) * 6)}.sm\:h-\[4\.9rem\]{height:4.9rem}.sm\:min-h-\[680px\]{min-height:680px}.sm\:w-\[4\.9rem\]{width:4.9rem}.sm\:max-w-none{max-width:none}.sm\:max-w-sm{max-width:var(--container-sm)}.sm\:grid-cols-2{grid-template-columns:repeat(2,minmax(0,1fr))}.sm\:grid-cols-\[minmax\(0\,1fr\)_auto\]{grid-template-columns:minmax(0,1fr) auto}.sm\:flex-row{flex-direction:row}.sm\:justify-end{justify-content:flex-end}:where(.sm\:space-y-5>:not(:last-child)){--tw-space-y-reverse:0;margin-block-start:calc(calc(var(--spacing) * 5) * var(--tw-space-y-reverse));margin-block-end:calc(calc(var(--spacing) * 5) * calc(1 - var(--tw-space-y-reverse)))}.sm\:p-3{padding:calc(var(--spacing) * 3)}.sm\:p-4{padding:calc(var(--spacing) * 4)}.sm\:px-5{padding-inline:calc(var(--spacing) * 5)}.sm\:px-6{padding-inline:calc(var(--spacing) * 6)}.sm\:px-7{padding-inline:calc(var(--spacing) * 7)}.sm\:py-7{padding-block:calc(var(--spacing) * 7)}}@media(min-width:64rem){.lg\:grid-cols-\[minmax\(0\,1\.2fr\)_minmax\(0\,1fr\)\]{grid-template-columns:minmax(0,1.2fr) minmax(0,1fr)}.lg\:grid-cols-\[minmax\(0\,1fr\)_15rem\]{grid-template-columns:minmax(0,1fr) 15rem}.lg\:items-start{align-items:flex-start}.lg\:items-stretch{align-items:stretch}.lg\:justify-between{justify-content:space-between}.lg\:p-5{padding:calc(var(--spacing) * 5)}}.\[\&_svg\]\:pointer-events-none svg{pointer-events:none}.\[\&_svg\]\:shrink-0 svg{flex-shrink:0}.\[\&\>span\]\:line-clamp-1>span{-webkit-line-clamp:1;-webkit-box-orient:vertical;display:-webkit-box;overflow:hidden}.page-fade{animation:.28s cubic-bezier(.22,1,.36,1) page-fade}.studio-panel{background:#fffcf7}@supports (color:color-mix(in lab,red,red)){.studio-panel{background:color-mix(in oklab,var(--color-paper) 94%,white)}}.studio-panel{box-shadow:0 12px 32px -30px #2b201a1f}@supports (color:color-mix(in lab,red,red)){.studio-panel{box-shadow:0 12px 32px -30px color-mix(in oklab,var(--color-ink) 12%,transparent)}}.studio-grid{background:radial-gradient(circle at top,#fbf3e761,#0000 62%),linear-gradient(#fffcf7,#fdfbf6)}@supports (color:color-mix(in lab,red,red)){.studio-grid{background:radial-gradient(circle at top,color-mix(in oklab,var(--color-oat) 38%,transparent),transparent 62%),linear-gradient(180deg,color-mix(in oklab,var(--color-paper) 97%,white),color-mix(in oklab,var(--color-oat) 36%,white))}}.soft-scroll{scrollbar-width:thin;scrollbar-color:#cec6bca6 transparent}@supports (color:color-mix(in lab,red,red)){.soft-scroll{scrollbar-color:color-mix(in oklab,var(--color-stone) 65%,transparent) transparent}}.page-halo{background:radial-gradient(circle at top,#fbf3e7c7,#0000 66%),linear-gradient(#fffcf7f5,#0000 80%)}@supports (color:color-mix(in lab,red,red)){.page-halo{background:radial-gradient(circle at top,color-mix(in oklab,var(--color-oat) 78%,transparent),transparent 66%),linear-gradient(180deg,color-mix(in oklab,var(--color-paper) 96%,transparent),transparent 80%)}}}@keyframes accordion-down{0%{height:0}to{height:var(--radix-accordion-content-height)}}@keyframes accordion-up{0%{height:var(--radix-accordion-content-height)}to{height:0}}@keyframes page-fade{0%{opacity:0;transform:translateY(12px)}to{opacity:1;transform:translateY(0)}}@keyframes section-rise{0%{opacity:0;transform:translateY(18px)}to{opacity:1;transform:translateY(0)}}@property --tw-scale-x{syntax:"*";inherits:false;initial-value:1}@property --tw-scale-y{syntax:"*";inherits:false;initial-value:1}@property --tw-scale-z{syntax:"*";inherits:false;initial-value:1}@property --tw-rotate-x{syntax:"*";inherits:false}@property --tw-rotate-y{syntax:"*";inherits:false}@property --tw-rotate-z{syntax:"*";inherits:false}@property --tw-skew-x{syntax:"*";inherits:false}@property --tw-skew-y{syntax:"*";inherits:false}@property --tw-pan-x{syntax:"*";inherits:false}@property --tw-pan-y{syntax:"*";inherits:false}@property --tw-pinch-zoom{syntax:"*";inherits:false}@property --tw-space-y-reverse{syntax:"*";inherits:false;initial-value:0}@property --tw-space-x-reverse{syntax:"*";inherits:false;initial-value:0}@property --tw-divide-x-reverse{syntax:"*";inherits:false;initial-value:0}@property --tw-border-style{syntax:"*";inherits:false;initial-value:solid}@property --tw-divide-y-reverse{syntax:"*";inherits:false;initial-value:0}@property --tw-leading{syntax:"*";inherits:false}@property --tw-font-weight{syntax:"*";inherits:false}@property --tw-tracking{syntax:"*";inherits:false}@property --tw-ordinal{syntax:"*";inherits:false}@property --tw-slashed-zero{syntax:"*";inherits:false}@property --tw-numeric-figure{syntax:"*";inherits:false}@property --tw-numeric-spacing{syntax:"*";inherits:false}@property --tw-numeric-fraction{syntax:"*";inherits:false}@property --tw-shadow{syntax:"*";inherits:false;initial-value:0 0 #0000}@property --tw-shadow-color{syntax:"*";inherits:false}@property --tw-shadow-alpha{syntax:"<percentage>";inherits:false;initial-value:100%}@property --tw-inset-shadow{syntax:"*";inherits:false;initial-value:0 0 #0000}@property --tw-inset-shadow-color{syntax:"*";inherits:false}@property --tw-inset-shadow-alpha{syntax:"<percentage>";inherits:false;initial-value:100%}@property --tw-ring-color{syntax:"*";inherits:false}@property --tw-ring-shadow{syntax:"*";inherits:false;initial-value:0 0 #0000}@property --tw-inset-ring-color{syntax:"*";inherits:false}@property --tw-inset-ring-shadow{syntax:"*";inherits:false;initial-value:0 0 #0000}@property --tw-ring-inset{syntax:"*";inherits:false}@property --tw-ring-offset-width{syntax:"<length>";inherits:false;initial-value:0}@property --tw-ring-offset-color{syntax:"*";inherits:false;initial-value:#fff}@property --tw-ring-offset-shadow{syntax:"*";inherits:false;initial-value:0 0 #0000}@property --tw-outline-style{syntax:"*";inherits:false;initial-value:solid}@property --tw-blur{syntax:"*";inherits:false}@property --tw-brightness{syntax:"*";inherits:false}@property --tw-contrast{syntax:"*";inherits:false}@property --tw-grayscale{syntax:"*";inherits:false}@property --tw-hue-rotate{syntax:"*";inherits:false}@property --tw-invert{syntax:"*";inherits:false}@property --tw-opacity{syntax:"*";inherits:false}@property --tw-saturate{syntax:"*";inherits:false}@property --tw-sepia{syntax:"*";inherits:false}@property --tw-drop-shadow{syntax:"*";inherits:false}@property --tw-drop-shadow-color{syntax:"*";inherits:false}@property --tw-drop-shadow-alpha{syntax:"<percentage>";inherits:false;initial-value:100%}@property --tw-drop-shadow-size{syntax:"*";inherits:false}@property --tw-backdrop-blur{syntax:"*";inherits:false}@property --tw-backdrop-brightness{syntax:"*";inherits:false}@property --tw-backdrop-contrast{syntax:"*";inherits:false}@property --tw-backdrop-grayscale{syntax:"*";inherits:false}@property --tw-backdrop-hue-rotate{syntax:"*";inherits:false}@property --tw-backdrop-invert{syntax:"*";inherits:false}@property --tw-backdrop-opacity{syntax:"*";inherits:false}@property --tw-backdrop-saturate{syntax:"*";inherits:false}@property --tw-backdrop-sepia{syntax:"*";inherits:false}@property --tw-duration{syntax:"*";inherits:false}@property --tw-ease{syntax:"*";inherits:false}@property --tw-translate-x{syntax:"*";inherits:false;initial-value:0}@property --tw-translate-y{syntax:"*";inherits:false;initial-value:0}@property --tw-translate-z{syntax:"*";inherits:false;initial-value:0}@keyframes spin{to{transform:rotate(360deg)}}

frontend/dist/index.html ADDED Viewed

	@@ -0,0 +1,14 @@

+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <link rel="icon" type="image/svg+xml" href="/vite.svg" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <title>LightDiffusion Next</title>
+    <script type="module" crossorigin src="/assets/index-7kNA4Hm-.js"></script>
+    <link rel="stylesheet" crossorigin href="/assets/index-CAwyaxYh.css">
+  </head>
+  <body>
+    <div id="root"></div>
+  </body>
+</html>

frontend/dist/vite.svg ADDED Viewed

frontend/eslint.config.js ADDED Viewed

	@@ -0,0 +1,23 @@

+import js from '@eslint/js'
+import globals from 'globals'
+import reactHooks from 'eslint-plugin-react-hooks'
+import reactRefresh from 'eslint-plugin-react-refresh'
+import tseslint from 'typescript-eslint'
+import { defineConfig, globalIgnores } from 'eslint/config'
+export default defineConfig([
+  globalIgnores(['dist']),
+  {
+    files: ['**/*.{ts,tsx}'],
+    extends: [
+      js.configs.recommended,
+      tseslint.configs.recommended,
+      reactHooks.configs.flat.recommended,
+      reactRefresh.configs.vite,
+    ],
+    languageOptions: {
+      ecmaVersion: 2020,
+      globals: globals.browser,
+    },
+  },
+])

frontend/index.html ADDED Viewed

	@@ -0,0 +1,13 @@

+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <link rel="icon" type="image/svg+xml" href="/vite.svg" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <title>LightDiffusion Next</title>
+  </head>
+  <body>
+    <div id="root"></div>
+    <script type="module" src="/src/main.tsx"></script>
+  </body>
+</html>

frontend/package-lock.json ADDED Viewed

The diff for this file is too large to render. See raw diff

frontend/package.json ADDED Viewed

	@@ -0,0 +1,49 @@

+{
+  "name": "frontend",
+  "private": true,
+  "version": "0.0.0",
+  "type": "module",
+  "scripts": {
+    "dev": "vite",
+    "build": "tsc -b && vite build",
+    "lint": "eslint .",
+    "preview": "vite preview"
+  },
+  "dependencies": {
+    "@radix-ui/react-accordion": "^1.2.12",
+    "@radix-ui/react-collapsible": "^1.1.12",
+    "@radix-ui/react-dialog": "^1.1.15",
+    "@radix-ui/react-label": "^2.1.8",
+    "@radix-ui/react-scroll-area": "^1.2.10",
+    "@radix-ui/react-select": "^2.2.6",
+    "@radix-ui/react-separator": "^1.1.8",
+    "@radix-ui/react-slot": "^1.2.4",
+    "@radix-ui/react-switch": "^1.2.6",
+    "axios": "^1.13.4",
+    "class-variance-authority": "^0.7.1",
+    "clsx": "^2.1.1",
+    "lucide-react": "^1.8.0",
+    "react": "^19.2.0",
+    "react-dom": "^19.2.0",
+    "react-dropzone": "^14.4.0",
+    "react-use-websocket": "^4.13.0",
+    "tailwind-merge": "^3.5.0",
+    "zustand": "^5.0.11"
+  },
+  "devDependencies": {
+    "@eslint/js": "^9.39.1",
+    "@tailwindcss/vite": "^4.2.2",
+    "@types/node": "^24.10.1",
+    "@types/react": "^19.2.5",
+    "@types/react-dom": "^19.2.3",
+    "@vitejs/plugin-react": "^5.1.1",
+    "eslint": "^9.39.1",
+    "eslint-plugin-react-hooks": "^7.0.1",
+    "eslint-plugin-react-refresh": "^0.4.24",
+    "globals": "^16.5.0",
+    "tailwindcss": "^4.2.2",
+    "typescript": "~5.9.3",
+    "typescript-eslint": "^8.46.4",
+    "vite": "^7.2.4"
+  }
+}

frontend/public/vite.svg ADDED Viewed

frontend/src/App.tsx ADDED Viewed

	@@ -0,0 +1,57 @@

+import { useState } from 'react';
+import { GenerationComposer } from './components/GenerationComposer';
+import { GenerationSettings } from './components/GenerationSettings';
+import { Gallery } from './components/Gallery';
+import { ImagePreview } from './components/ImagePreview';
+import {
+  Sheet,
+  SheetContent,
+  SheetDescription,
+  SheetHeader,
+  SheetTitle,
+} from './components/ui/sheet';
+import { useGenerationBootstrap } from './hooks/use-generation-bootstrap';
+import { useMediaQuery } from './hooks/use-media-query';
+export default function App() {
+  useGenerationBootstrap();
+  const [controlsOpen, setControlsOpen] = useState(false);
+  const isDesktop = useMediaQuery('(min-width: 1024px)');
+  const controlSide = isDesktop ? 'right' : 'bottom';
+  return (
+    <div className="min-h-screen bg-canvas text-ink">
+      <div className="page-halo pointer-events-none absolute inset-x-0 top-0 h-96" />
+      <main className="page-fade relative mx-auto flex min-h-screen w-full max-w-[1320px] flex-col px-4 pb-10 pt-4 sm:px-6">
+        <section className="mx-auto min-h-0 w-full max-w-[1200px] space-y-4 sm:space-y-5">
+          <GenerationComposer onOpenAdvanced={() => setControlsOpen(true)} />
+          <ImagePreview />
+          <Gallery />
+        </section>
+      </main>
+      <Sheet open={controlsOpen} onOpenChange={setControlsOpen}>
+        <SheetContent
+          side={controlSide}
+          className={
+            isDesktop
+              ? 'h-[calc(100vh-2rem)] w-[26rem] overflow-hidden sm:max-w-none'
+              : 'h-[min(88vh,860px)] overflow-hidden'
+          }
+        >
+          <SheetHeader>
+            <SheetTitle>Advanced controls</SheetTitle>
+            <SheetDescription>
+              Sampling, conditioning, optimization, and history for the next run.
+            </SheetDescription>
+          </SheetHeader>
+          <div className="mt-4 h-[calc(100%-4rem)] min-h-0">
+            <GenerationSettings />
+          </div>
+        </SheetContent>
+      </Sheet>
+    </div>
+  );
+}

frontend/src/api/client.ts ADDED Viewed

	@@ -0,0 +1,70 @@

+import axios from 'axios';
+import type {
+    GenerationSettings,
+    GenerationResponse,
+    ImageMetadata,
+    ModelInfo,
+    SettingsPreferences,
+    SettingsSnapshot,
+} from '../types';
+const api = axios.create({
+    baseURL: '/api', // Proxy handles redirection to localhost:7861
+});
+export const listModels = async (): Promise<ModelInfo[]> => {
+    const res = await api.get<ModelInfo[]>('/models');
+    return res.data;
+};
+export const listControlNets = async (): Promise<{ models: string[] }> => {
+    const res = await api.get<{ models: string[] }>('/controlnets');
+    return res.data;
+};
+export const generateImage = async (settings: GenerationSettings): Promise<GenerationResponse> => {
+    const res = await api.post<GenerationResponse>('/generate', settings);
+    console.log("Generation response:", res.data);
+    return res.data;
+};
+export const interruptGeneration = async (): Promise<void> => {
+    await api.post('/interrupt');
+};
+export const getLastSeed = async (): Promise<{ seed: number | null }> => {
+    const res = await api.get('/settings/last');
+    return res.data;
+};
+export const getSettingsHistory = async (): Promise<{ history: SettingsSnapshot[] }> => {
+    const res = await api.get('/settings/history');
+    return res.data;
+};
+export const getSettingsPreferences = async (): Promise<SettingsPreferences> => {
+    const res = await api.get('/settings/preferences');
+    return res.data;
+};
+export const postSettingsPreferences = async (preferences: SettingsPreferences): Promise<SettingsPreferences> => {
+    const res = await api.post('/settings/preferences', preferences);
+    return res.data;
+};
+export const postSettingsSnapshot = async (settings: GenerationSettings, include_prompt: boolean = false): Promise<{ snapshot: SettingsSnapshot }> => {
+    const res = await api.post('/settings/history', { settings, include_prompt });
+    return res.data;
+};
+export const getImageMetadata = async (imageB64: string): Promise<{ metadata: ImageMetadata }> => {
+    const res = await api.post('/images/metadata', { image: imageB64 });
+    return res.data;
+};
+export const getTelemetry = async (): Promise<Record<string, unknown>> => {
+    const res = await api.get('/telemetry');
+    return res.data;
+}
+export default api;

frontend/src/assets/react.svg ADDED Viewed

frontend/src/components/Gallery.tsx ADDED Viewed

	@@ -0,0 +1,62 @@

+import { ScrollArea, ScrollBar } from './ui/scroll-area';
+import { useStore } from '../store/useStore';
+import { cn } from '../lib/utils';
+import { useShallow } from 'zustand/react/shallow';
+export function Gallery() {
+  const { currentImage, gallery, setCurrentImage } = useStore(useShallow((state) => ({
+    currentImage: state.currentImage,
+    gallery: state.gallery,
+    setCurrentImage: state.setCurrentImage,
+  })));
+  return (
+    <section className="-mt-2 overflow-hidden rounded-b-[2rem] border border-line border-t-0 bg-paper/62 px-4 pb-3 pt-2 sm:px-5">
+      <div className="flex items-center justify-between gap-3">
+        <h2 className="font-serif text-[1.05rem] tracking-[-0.025em] text-ink">Recent</h2>
+        <p className="text-xs text-muted">
+          {gallery.length === 0 ? 'No saved frames yet' : `${gallery.length} saved`}
+        </p>
+      </div>
+      {gallery.length === 0 ? (
+        <div className="mt-4 rounded-[1.4rem] border border-dashed border-line bg-oat/45 px-4 py-6 text-sm text-muted">
+          Generated images will collect here for quick comparison.
+        </div>
+      ) : (
+        <ScrollArea className="mt-2.5 w-full whitespace-nowrap">
+          <div className="flex gap-2 pb-2">
+            {gallery.map((image, index) => {
+              const isSelected = image === currentImage;
+              return (
+                <button
+                  key={`${index}-${image.slice(0, 28)}`}
+                  type="button"
+                  onClick={() => setCurrentImage(image)}
+                  className={cn(
+                    'group relative w-[4.25rem] shrink-0 overflow-hidden rounded-[1rem] border bg-paper text-left transition sm:w-[4.9rem]',
+                    isSelected
+                      ? 'border-clay shadow-[0_10px_20px_-18px_color-mix(in_oklab,var(--color-clay)_28%,transparent)]'
+                      : 'border-line hover:-translate-y-0.5 hover:border-clay/35',
+                  )}
+                  aria-label={`Open image ${index + 1}`}
+                >
+                  <img
+                    src={image}
+                    alt={`Generated frame ${index + 1}`}
+                    loading="lazy"
+                    decoding="async"
+                    className="h-[4.25rem] w-full object-cover sm:h-[4.9rem]"
+                  />
+                  {isSelected ? <div className="absolute right-3 top-3 h-2.5 w-2.5 rounded-full bg-clay" /> : null}
+                </button>
+              );
+            })}
+          </div>
+          <ScrollBar orientation="horizontal" />
+        </ScrollArea>
+      )}
+    </section>
+  );
+}