LTX-2.3 Distilled GGUF Workflows Collection
This repository provides a comprehensive collection of optimized ComfyUI workflows tailored for the lightricks/ltx-2.3 base model (specifically using Distilled GGUF variants). These workflows leverage distilled, quantized capabilities to enable high-speed, high-quality video generation, upscaling, multi-subject synthesis, and audio generation directly within ComfyUI.
Workflow Files & Overview
The repository contains the following experimental and production-ready workflow files.
(Note: YYMM.x in the file names represents the release year, month, and version number, e.g., 2606.1)
| # | File Name | Workflow Target | Key Features & Quick Notes |
|---|---|---|---|
| 1 | LTX-2.3_i2v-javanoYYMM.x.json | Image-to-Video (Long Gen) | Standard Image/Text-to-Video. Supports Single/Extend mode, Double-Frame mode for smooth motion, and prompt enhancement. |
| 2 | LTX-2.3_v2v-javanoYYMM.x.json | Video-to-Video | Modifies existing videos via Motion Track (OpenPose/Canny/Depth) or Inpaint Edit ([Add], [Remove], [Replace]) modes. |
| 3 | LTX-2.3_Director-javanoYYMM.x.json | Director Control (Multi-Key-Frame) | Keyframe-driven composition. Visually sequence multiple images, adjust timing, and control dynamic camera choreography. |
| 4 | LTX-2.3_Detailer-javanoYYMM.x.json | Detailer / Upscaler | High-fidelity latent/spatial upscaling. Eliminates artifacts, improves facial consistency, and smooths noise. |
| 5 | LTX-2.3_MSR-javanoYYMM.x.json | Multiple Subject Reference | Synthesizes videos by simultaneously referencing up to 5 individual elements (1-4 subjects + 1 background). |
| 6 | LTX-2.3_MF-javanoYYMM.x.json | Multi Key-Frame Gen | Visually schedules multiple keyframes. Utilizes an optimized 2-pass generation and hybrid sampling pipeline for visual continuity. |
| 7 | LTX-2.3_TTS-javanoYYMM.x.json | Text-to-Speech (Audio Only) | Bypasses video rendering completely to harness LTX-2.3's native acoustic capabilities for fast speech synthesis. |
Detailed Workflow Breakdown
1. Image-to-Video / Long Video (i2v)
Designed for extended video creation using fast Distilled GGUF processing based on the lightricks/ltx-2.3 architecture.
- Core Modes: Seamlessly switch between a one-time video generation or iteratively extending the video sequence.
- Prompt Enhancement: Features three prompt modes—Ollama-driven enhancement, native LTX prompt enhancement, and plain text. If you do not use Ollama, you can safely bypass or delete the Ollama SubGraph node.
- Stability Features: Includes a toggleable preview during sampling, audio-driven syncing, and Double-Frame mode to eliminate facial/structural distortion during high-motion scenes.
2. Video-to-Video (v2v)
Specialized in modifying existing video content or transferring motion from a reference source.
- Motion Track Mode: Extract pose and structural data via Depth, Canny, or OpenPose (default) from a source video to animate a target starting image.
- Inpaint Edit Mode: Target specific elements to add, remove, or replace objects utilizing descriptive action tags (e.g.,
[Add],[Remove],[Replace]).
3 & 6. Director Control & Multi Key-Frame Generation (Director / MF)
These two workflows share a very similar core framework powered by WhatDreamsCost custom node suites (keyframeflf/flf), focusing on advanced image ordering and keyframe scheduling.
- Visual Timeline Control: Both workflows allow you to visually specify image sequences, precise rendering order, and frame timing directly on a visual layout.
- Camera & Sampling (Director/MF): Tailor complex camera panning and transitions.
4. Detailer / Upscaler (Detailer)
A refinement pipeline dedicated to enhancing clarity and correcting visual issues in generated video outputs.
- Latent/Spatial Upscaling: High-fidelity detailing that enhances facial consistency and overall texture resolution.
5. Multiple Subject Reference (MSR)
Generates coherent videos by synthesizing multiple distinct visual components simultaneously.
- Multi-Subject Matrix: Reference up to 5 individual assets (1–4 distinct subjects and 1 background image) to create a single unified scene.
- Dynamic Prompt Pairing: Since explicit image indexing isn't natively supported, optimal blending is achieved by describing the unique visual characteristics of each reference image within the global text configuration.
- Voice Cloning Support: Features integrated ID-LoRA (
id-lora-talkvid-3k,id-lora-celebvhq-3k) compatibility for specialized audio/voice cloning synchronized with the generation.
7. Text-to-Speech (TTS)
An experimental audio-only workflow focused strictly on leveraging LTX-2.3’s native acoustic engine.
- Pure Audio Processing: Completely bypasses the video rendering channels, drastically saving VRAM and generation time.
- Multilingual & Conditioning: Handles script and language definitions inside the main prompt. Features optional reference image loading to condition and clone the speaker's vocal characteristics.
Prerequisites & Installation
To ensure all nodes load correctly without errors, please update your environment to the latest versions:
- ComfyUI Core: Update ComfyUI to v0.24.0 / Frontend 1.45.15 or newer.
- Custom Nodes: * Update
ComfyUI-KJNodesandComfyUI-LTXVideovia the ComfyUI Manager.- Install specific keyframe/director custom node suites via ComfyUI Manager -> Install Missing Custom Nodes if any node appears red upon loading.
- Required Models:
- Base Model / GGUF: Download the appropriate
lightricks/ltx-2.3Distilled GGUF checkpoints. - Upscaler: Download
ltx-2.3-spatial-upscaler-x2-1.1for workflows using the 2-pass detailer system.
- Base Model / GGUF: Download the appropriate