LTX-2.3 Sync-LoRA (3d1t, rank 256)

An In-Context LoRA (IC-LoRA) for LTX-2.3 (22B) that performs first-frame-driven video editing.

Given:

  • a reference video (the motion / identity to preserve), and
  • an edited first frame (a single image showing the desired edit applied to frame 0),

the model generates the full edited video β€” the edit from the first frame is propagated and kept in sync with the reference video's motion across all frames.

This is the rank-256 3d1t variant of the Sync-LoRA.

⚑ The prompt: use the token 3d1t

This LoRA was trained with a constant caption β€” the literal token 3d1t β€” for every sample. The edit is driven entirely by the first-frame image and the reference video, not by a text description. So at inference:

Always set the text prompt to exactly 3d1t.

Do not write a descriptive prompt (e.g. "a person with red hair"); describe the edit by supplying the edited first frame instead. 3d1t is a "special edit token" telling the model to edit via the first frame + reference.

How it works (conditioning)

Input How it's wired
Edited first frame (image) image conditioning at latent index 0 (VideoConditionByLatentIndex, strength 1.0) β€” replaces frame 0
Reference video IC-LoRA reference conditioning (VideoConditionByReferenceLatent, strength 1.0)
Text prompt the constant token 3d1t

Video-only (no audio).

Training details

  • Base model: LTX-2.3 (22B), dev checkpoint
  • Type: IC-LoRA, rank = alpha = 256
  • Resolution / length (training): 512Γ—512, 81 frames, 25 fps
  • Caption: constant token 3d1t (text conditioning effectively removed)
  • File: ltx-2.3-sync-lora-3d1t-r256.safetensors (ComfyUI-style keys, diffusion_model. prefix), step 5000

Inference

Use the LTX-2 ltx-pipelines IC-LoRA, two-stage distilled pipeline (stage 1 at half resolution β†’ Γ—2 spatial upscale β†’ stage 2 refine).

Important for LTX-2.3: distillation is shipped as a LoRA, so stack the LTX-2.3 distilled-lora-384 together with this Sync-LoRA on both stages (8-step stage 1 + 3-step stage 2).

You will need (from the LTX-2.3 release):

  • ltx-2.3-22b-dev.safetensors (base)
  • ltx-2.3-22b-distilled-lora-384-1.1.safetensors (distillation LoRA)
  • ltx-2.3-spatial-upscaler-x2-1.1.safetensors (stage-2 upscaler)
  • the Gemma text encoder

CLI (sketch)

python -m ltx_pipelines.ic_lora \
  --distilled-checkpoint-path  ltx-2.3-22b-dev.safetensors \
  --spatial-upsampler-path     ltx-2.3-spatial-upscaler-x2-1.1.safetensors \
  --gemma-root                 path/to/gemma \
  --lora ltx-2.3-sync-lora-3d1t-r256.safetensors 1.0 \
  --lora ltx-2.3-22b-distilled-lora-384-1.1.safetensors 1.0 \
  --prompt "3d1t" \
  --video-conditioning reference.mp4 1.0 \
  --images edited_first_frame.png 0 1.0 \
  --height 1024 --width 1024 --num-frames 81 --frame-rate 25 --seed 42 \
  --output-path out.mp4

Notes:

  • --prompt "3d1t" (the token) β€” required.
  • --images <png> 0 1.0 puts the edited frame at index 0; --video-conditioning <mp4> 1.0 is the reference.
  • Stage 1 runs at half the requested resolution, so --height/--width 1024 β†’ stage-1 512 (the training resolution). Resolution must be divisible by 64; frames must satisfy frames % 8 == 1.
  • To match an input clip's duration, set --num-frames/--frame-rate accordingly (e.g. a 5.1 s, 30 fps clip β†’ --num-frames 153 --frame-rate 30). Non-square aspect ratios (e.g. portrait 768Γ—1024) work and avoid cropping a portrait input.
  • On LTX-2.3, stack the distilled-lora-384 on both stages (the stock pipeline leaves stage 2 LoRA-free, which expects an already-fused distilled checkpoint).

Python (building blocks)

from ltx_core.loader import LTXV_LORA_COMFY_RENAMING_MAP, LoraPathStrengthAndSDOps
from ltx_pipelines.ic_lora import ICLoraPipeline

sync = LoraPathStrengthAndSDOps("ltx-2.3-sync-lora-3d1t-r256.safetensors", 1.0, LTXV_LORA_COMFY_RENAMING_MAP)
distilled = LoraPathStrengthAndSDOps("ltx-2.3-22b-distilled-lora-384-1.1.safetensors", 1.0, LTXV_LORA_COMFY_RENAMING_MAP)

pipe = ICLoraPipeline(
    distilled_checkpoint_path="ltx-2.3-22b-dev.safetensors",
    spatial_upsampler_path="ltx-2.3-spatial-upscaler-x2-1.1.safetensors",
    gemma_root="path/to/gemma",
    loras=[sync, distilled],
)
video, _ = pipe(
    prompt="3d1t",                                   # the token
    seed=42, height=1024, width=1024, num_frames=81, frame_rate=25,
    images=[("edited_first_frame.png", 0, 1.0)],     # edit at frame 0
    video_conditioning=[("reference.mp4", 1.0)],     # reference video
)

Limitations

  • Trained at 512Γ—512 / 81 frames; other resolutions and lengths work but are out of the training distribution and may degrade.
  • The text branch is intentionally inert β€” only 3d1t was ever seen during training.
Downloads last month
237
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for SagiPolaczek/LTX-2.3-Sync-LoRA

Base model

Lightricks/LTX-2
Adapter
(54)
this model