You need to agree to share your contact information to access this model
By clicking "Agree and Access" you acknowledge the Privacy Policy and consent to receive offers and updates. You can unsubscribe at any time.
Log in or Sign Up to review the conditions and access this model content.
LTX-2.3 22B IC-LoRA Reference Sheet Control
This is an IC-LoRA trained on top of LTX-2.3-22B, which conditions video generation on a reference sheet β a single composite image inventorying the characters, props, and location of a scene β so that generated videos keep those elements visually consistent.
It is based on the LTX-2.3 foundation model.
Model Files
ltx-2.3-22b-ic-lora-ingredients-0.9.safetensors
Model Details
- Base Model: LTX-2.3-22B (dev)
- Training Type: IC-LoRA (in-context LoRA)
- Control Type: Reference-sheet conditioning β character / prop / location identity carried into the generated video
- Reference Downscale Factor: 1 (the reference is provided at the same resolution as the output)
- Pipeline details: The reference sheet is supplied as a static video (the still sheet looped to the output's length and frame rate). The model is trained with a
video_to_videostrategy over reference latents; no extra color/space transforms are applied at inference.
Intended Use & Out-of-Scope
Intended use: Generating short video clips that stay faithful to a supplied reference sheet β keeping recurring characters (face and costume), handled props, and the set/location consistent with the sheet while following an action described in the prompt.
Out of scope: This is not a general text-to-video model β it expects a reference sheet as conditioning. It was trained at a single resolution / length bucket (768Γ448, 121 frames, 24 fps); other resolutions, much longer clips, or use without a reference sheet are out of distribution. It does not reproduce identities that are absent from the supplied sheet.
Control Signal Requirements
- Control signal type: Reference sheet β a single composite image with one clean panel per distinct visual element (each character as a face close-up + body turnaround, each prop as a product-style render, and one clean location panel), laid out on a black background with no text.
- Expected input: A static video built from the reference sheet, looped to match the output clip's length and frame rate, at the output resolution (downscale factor 1).
- Preprocessing: Author the reference sheet with the element-driven reference-sheet generator, then loop the still into a static video. Frame count must be β₯ 121 so the reference-encoding / 121-frame read bucket is satisfied; all targets in training were β₯ 121 frames.
- Alignment: The reference video should match the output resolution and frame rate; its frame count must be at least the output length (clamped to β₯ 121).
How It Works
The prompt is split into two labeled parts, matching how the model was trained:
Reference sheet: <description of the panels in the sheet β characters, props, location>
Generated video: <description of the action / shot you want generated>
At inference the reference sheet (as a static video) supplies the what things look like, and the Generated video: portion of the prompt supplies the what happens. The model reads the reference latents in-context and renders a new clip whose characters, props, and setting match the sheet.
Usage
π ComfyUI
- Copy the LoRA weights into
models/loras. - Load the LTX-2.3-22B base model and add
lora_weights_step_12000.safetensorsas the LoRA. - Start at strength
1.0and adjust to taste. - Use an IC-LoRA / reference workflow from the LTX-2 ComfyUI repository, which already wires the reference (control) input. Connect the reference-sheet static video as the control/reference input; a generic LoRA loader that ignores the reference path will not apply the conditioning. See the IC-LoRA docs.
Recommended Settings
- LoRA strength / weight: 1.4
- Inference steps: 30
- Guidance scale: 4.0
- Resolution & frames: 768Γ448, 121 frames, 24 fps (the trained bucket β best results here)
- Prompting: Use the two-part
Reference sheet: β¦ / Generated video: β¦structure above. TheReference sheet:text should describe the panels present; theGenerated video:text drives the action. Suggested negative prompt:worst quality, inconsistent motion, blurry, jittery, distorted. Validation used spatiotemporal guidance (STG, modestg_v, block 29, scale 1.0), which can help motion stability.
References
- Code: GitHub Repository
- IC-LoRA docs: docs.ltx.video β IC-LoRA usage guide
Tips & Troubleshooting
- Bigger panels carry over better: The more space an element takes up in the reference image, the more faithfully it carries over into the generated video. Give important characters/props larger, more prominent panels rather than small or crowded ones.
- Identity drift: If a character's face or costume drifts, make sure the reference sheet has a clean, front-facing close-up and full turnaround for that character, and that its panel isn't cluttered or text-laden.
- Element not appearing: The model only reproduces elements present on the sheet β add a dedicated panel for any prop/character you need to persist, and describe it in the
Reference sheet:portion of the prompt. - Reference too short: The reference static video must be β₯ 121 frames; shorter references break the reference-encoding bucket.
Dataset
The model was trained using a proprietary dataset of video clips paired with generated reference sheets.
Training
- Technique: IC-LoRA (rank 128, alpha 128, dropout 0.0) on the DiT transformer β
attn1/attn2q/k/v/out projections and the feed-forward layers. - Hyperparameters: bf16 mixed precision, AdamW-8bit, gradient checkpointing, batch size 1, gradient accumulation 1, max grad norm 1.0, seed 42. Learning rate: 1.3e-4 (linear scheduler) for the first 6,000 steps, then a low constant 1.3e-5 for the continuation to 12,000.
- Strategy:
video_to_videoover reference latents,first_frame_conditioning_p0.0, reference downscale factor 1. - Steps: 12,000 (recommended checkpoint: step 12,000).
- Infrastructure: LTX-2 Community Trainer, 8Γ GPU DDP.
License
See the LTX-2-community-license for full terms.
Acknowledgments
- Base model by Lightricks
- Training infrastructure: LTX-2 Community Trainer
Model tree for Lightricks/LTX-2.3-22b-IC-LoRA-Ingredients
Base model
Lightricks/LTX-2.3