LTX-2.3 Distilled GGUF Workflows Collection

This repository provides a comprehensive collection of optimized ComfyUI workflows tailored for the lightricks/ltx-2.3 base model (specifically using Distilled GGUF variants). These workflows leverage distilled, quantized capabilities to enable high-speed, high-quality video generation, upscaling, multi-subject synthesis, and audio generation directly within ComfyUI.


Workflow Files & Overview

The repository contains the following experimental and production-ready workflow files.
(Note: YYMM.x in the file names represents the release year, month, and version number, e.g., 2606.1)

# File Name Workflow Target Key Features & Quick Notes
1 LTX-2.3_i2v-javanoYYMM.x.json Image-to-Video (Long Gen) Standard Image/Text-to-Video. Supports Single/Extend mode, Double-Frame mode for smooth motion, and prompt enhancement.
2 LTX-2.3_v2v-javanoYYMM.x.json Video-to-Video Modifies existing videos via Motion Track (OpenPose/Canny/Depth) or Inpaint Edit ([Add], [Remove], [Replace]) modes.
3 LTX-2.3_Director-javanoYYMM.x.json Director Control (Multi-Key-Frame) Keyframe-driven composition. Visually sequence multiple images, adjust timing, and control dynamic camera choreography.
4 LTX-2.3_Detailer-javanoYYMM.x.json Detailer / Upscaler High-fidelity latent/spatial upscaling. Eliminates artifacts, improves facial consistency, and smooths noise.
5 LTX-2.3_MSR-javanoYYMM.x.json Multiple Subject Reference Synthesizes videos by simultaneously referencing up to 5 individual elements (1-4 subjects + 1 background).
6 LTX-2.3_MF-javanoYYMM.x.json Multi Key-Frame Gen Visually schedules multiple keyframes. Utilizes an optimized 2-pass generation and hybrid sampling pipeline for visual continuity.
7 LTX-2.3_TTS-javanoYYMM.x.json Text-to-Speech (Audio Only) Bypasses video rendering completely to harness LTX-2.3's native acoustic capabilities for fast speech synthesis.

Detailed Workflow Breakdown

1. Image-to-Video / Long Video (i2v)

Designed for extended video creation using fast Distilled GGUF processing based on the lightricks/ltx-2.3 architecture.

  • Core Modes: Seamlessly switch between a one-time video generation or iteratively extending the video sequence.
  • Prompt Enhancement: Features three prompt modes—Ollama-driven enhancement, native LTX prompt enhancement, and plain text. If you do not use Ollama, you can safely bypass or delete the Ollama SubGraph node.
  • Stability Features: Includes a toggleable preview during sampling, audio-driven syncing, and Double-Frame mode to eliminate facial/structural distortion during high-motion scenes.

2. Video-to-Video (v2v)

Specialized in modifying existing video content or transferring motion from a reference source.

  • Motion Track Mode: Extract pose and structural data via Depth, Canny, or OpenPose (default) from a source video to animate a target starting image.
  • Inpaint Edit Mode: Target specific elements to add, remove, or replace objects utilizing descriptive action tags (e.g., [Add], [Remove], [Replace]).

3 & 6. Director Control & Multi Key-Frame Generation (Director / MF)

These two workflows share a very similar core framework powered by WhatDreamsCost custom node suites (keyframeflf/flf), focusing on advanced image ordering and keyframe scheduling.

  • Visual Timeline Control: Both workflows allow you to visually specify image sequences, precise rendering order, and frame timing directly on a visual layout.
  • Camera & Sampling (Director/MF): Tailor complex camera panning and transitions.

4. Detailer / Upscaler (Detailer)

A refinement pipeline dedicated to enhancing clarity and correcting visual issues in generated video outputs.

  • Latent/Spatial Upscaling: High-fidelity detailing that enhances facial consistency and overall texture resolution.

5. Multiple Subject Reference (MSR)

Generates coherent videos by synthesizing multiple distinct visual components simultaneously.

  • Multi-Subject Matrix: Reference up to 5 individual assets (1–4 distinct subjects and 1 background image) to create a single unified scene.
  • Dynamic Prompt Pairing: Since explicit image indexing isn't natively supported, optimal blending is achieved by describing the unique visual characteristics of each reference image within the global text configuration.
  • Voice Cloning Support: Features integrated ID-LoRA (id-lora-talkvid-3k, id-lora-celebvhq-3k) compatibility for specialized audio/voice cloning synchronized with the generation.

7. Text-to-Speech (TTS)

An experimental audio-only workflow focused strictly on leveraging LTX-2.3’s native acoustic engine.

  • Pure Audio Processing: Completely bypasses the video rendering channels, drastically saving VRAM and generation time.
  • Multilingual & Conditioning: Handles script and language definitions inside the main prompt. Features optional reference image loading to condition and clone the speaker's vocal characteristics.

Prerequisites & Installation

To ensure all nodes load correctly without errors, please update your environment to the latest versions:

  1. ComfyUI Core: Update ComfyUI to v0.24.0 / Frontend 1.45.15 or newer.
  2. Custom Nodes: * Update ComfyUI-KJNodes and ComfyUI-LTXVideo via the ComfyUI Manager.
    • Install specific keyframe/director custom node suites via ComfyUI Manager -> Install Missing Custom Nodes if any node appears red upon loading.
  3. Required Models:
    • Base Model / GGUF: Download the appropriate lightricks/ltx-2.3 Distilled GGUF checkpoints.
    • Upscaler: Download ltx-2.3-spatial-upscaler-x2-1.1 for workflows using the 2-pass detailer system.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support