LTX-2.3 Distilled GGUF Workflows Collection

This repository provides a comprehensive collection of optimized ComfyUI workflows tailored for the lightricks/ltx-2.3 base model (specifically using Distilled GGUF variants). These workflows leverage distilled, quantized capabilities to enable high-speed, high-quality video generation, upscaling, multi-subject synthesis, and audio generation directly within ComfyUI.

Workflow Files & Overview

The repository contains the following experimental and production-ready workflow files.
(Note: YYMM.x in the file names represents the release year, month, and version number, e.g., 2606.1)

#	File Name	Workflow Target	Key Features & Quick Notes
1	LTX-2.3_i2v-javanoYYMM.x.json	Image-to-Video (Long Gen)	Standard Image/Text-to-Video. Supports Single/Extend mode, Double-Frame mode for smooth motion, and prompt enhancement.
2	LTX-2.3_v2v-javanoYYMM.x.json	Video-to-Video	Modifies existing videos via Motion Track (OpenPose/Canny/Depth) or Inpaint Edit (`[Add]`, `[Remove]`, `[Replace]`) modes.
3	LTX-2.3_Director-javanoYYMM.x.json	Director Control (Multi-Key-Frame)	Keyframe-driven composition. Visually sequence multiple images, adjust timing, and control dynamic camera choreography.
4	LTX-2.3_Detailer-javanoYYMM.x.json	Detailer / Upscaler	High-fidelity latent/spatial upscaling. Eliminates artifacts, improves facial consistency, and smooths noise.
5	LTX-2.3_MSR-javanoYYMM.x.json	Multiple Subject Reference	Synthesizes videos by simultaneously referencing up to 5 individual elements (1-4 subjects + 1 background).
6	LTX-2.3_MF-javanoYYMM.x.json	Multi Key-Frame Gen	Visually schedules multiple keyframes. Utilizes an optimized 2-pass generation and hybrid sampling pipeline for visual continuity.
7	LTX-2.3_TTS-javanoYYMM.x.json	Text-to-Speech (Audio Only)	Bypasses video rendering completely to harness LTX-2.3's native acoustic capabilities for fast speech synthesis.

Detailed Workflow Breakdown

1. Image-to-Video / Long Video (`i2v`)

Designed for extended video creation using fast Distilled GGUF processing based on the lightricks/ltx-2.3 architecture.

Core Modes: Seamlessly switch between a one-time video generation or iteratively extending the video sequence.
Prompt Enhancement: Features three prompt modes—Ollama-driven enhancement, native LTX prompt enhancement, and plain text. If you do not use Ollama, you can safely bypass or delete the Ollama SubGraph node.
Stability Features: Includes a toggleable preview during sampling, audio-driven syncing, and Double-Frame mode to eliminate facial/structural distortion during high-motion scenes.

2. Video-to-Video (`v2v`)

Specialized in modifying existing video content or transferring motion from a reference source.

Motion Track Mode: Extract pose and structural data via Depth, Canny, or OpenPose (default) from a source video to animate a target starting image.
Inpaint Edit Mode: Target specific elements to add, remove, or replace objects utilizing descriptive action tags (e.g., [Add], [Remove], [Replace]).

3 & 6. Director Control & Multi Key-Frame Generation (`Director` / `MF`)

These two workflows share a very similar core framework powered by WhatDreamsCost custom node suites (keyframeflf/flf), focusing on advanced image ordering and keyframe scheduling.

Visual Timeline Control: Both workflows allow you to visually specify image sequences, precise rendering order, and frame timing directly on a visual layout.
Camera & Sampling (Director/MF): Tailor complex camera panning and transitions.

4. Detailer / Upscaler (`Detailer`)

A refinement pipeline dedicated to enhancing clarity and correcting visual issues in generated video outputs.

Latent/Spatial Upscaling: High-fidelity detailing that enhances facial consistency and overall texture resolution.

5. Multiple Subject Reference (`MSR`)

Generates coherent videos by synthesizing multiple distinct visual components simultaneously.

Multi-Subject Matrix: Reference up to 5 individual assets (1–4 distinct subjects and 1 background image) to create a single unified scene.
Dynamic Prompt Pairing: Since explicit image indexing isn't natively supported, optimal blending is achieved by describing the unique visual characteristics of each reference image within the global text configuration.
Voice Cloning Support: Features integrated ID-LoRA (id-lora-talkvid-3k, id-lora-celebvhq-3k) compatibility for specialized audio/voice cloning synchronized with the generation.

7. Text-to-Speech (`TTS`)

An experimental audio-only workflow focused strictly on leveraging LTX-2.3’s native acoustic engine.

Pure Audio Processing: Completely bypasses the video rendering channels, drastically saving VRAM and generation time.
Multilingual & Conditioning: Handles script and language definitions inside the main prompt. Features optional reference image loading to condition and clone the speaker's vocal characteristics.

Prerequisites & Installation

To ensure all nodes load correctly without errors, please update your environment to the latest versions:

ComfyUI Core: Update ComfyUI to v0.24.0 / Frontend 1.45.15 or newer.
Custom Nodes: * Update ComfyUI-KJNodes and ComfyUI-LTXVideo via the ComfyUI Manager.
- Install specific keyframe/director custom node suites via ComfyUI Manager -> Install Missing Custom Nodes if any node appears red upon loading.
Required Models:
- Base Model / GGUF: Download the appropriate lightricks/ltx-2.3 Distilled GGUF checkpoints.
- Upscaler: Download ltx-2.3-spatial-upscaler-x2-1.1 for workflows using the 2-pass detailer system.

Downloads last month: -; Downloads are not tracked for this model. How to track

LTX-2.3 Distilled GGUF Workflows Collection

Workflow Files & Overview

Detailed Workflow Breakdown

1. Image-to-Video / Long Video (i2v)

2. Video-to-Video (v2v)

3 & 6. Director Control & Multi Key-Frame Generation (Director / MF)

4. Detailer / Upscaler (Detailer)

5. Multiple Subject Reference (MSR)