You need to agree to share your contact information to access this model
By clicking "Agree and Access" you acknowledge the Privacy Policy and consent to receive offers and updates. You can unsubscribe at any time.
Log in or Sign Up to review the conditions and access this model content.
LTX-2.3 22B IC-LoRA Deblur (v2)
This is a Deblur IC-LoRA trained on top of LTX-2.3-22B, which restores sharpness to out-of-focus / defocused video by conditioning on the blurry clip and regenerating it in sharp focus while preserving the original subject, framing, and scene geometry.
It is based on the LTX-2.3 foundation model.
Model Files
ltx-2.3-22b-ic-lora-deblur-0.9.safetensors
The shipped checkpoint is step 1000. This run was planned for 1500 steps and stopped early at 1000; the quality sweet spot is in the ~800β1000 range, and step 1000 is the recommended default. Earlier checkpoints (steps 100β900) are available from the training run if you want to trade restoration strength for a gentler effect.
Model Details
- Base Model: LTX-2.3-22B Video
- Training Type: IC-LoRA (video-to-video, paired referenceβtarget)
- Control Type: Defocus/out-of-focus blur β the model conditions on a blurry reference video and outputs the sharp version
- Reference Downscale Factor: 1 (the reference is processed at the same resolution as the output)
- Pipeline details: No special pre/post color transform. Reference (blurry) and target (sharp) share identical content; only focus/sharpness differ.
Intended Use & Out-of-Scope
Intended use: Recovering sharpness from genuinely out-of-focus or softly defocused footage β landscape and portrait, mixed real-world content (people, wildlife, nature, cities, food, night). Designed to be driven by the production IC-LoRA video-to-video inference pipeline at native 1080p.
Out of scope: Motion-blur removal (the dataset contains no temporal/motion blur), heavy compression-artifact repair, denoising, or super-resolution of already-sharp footage. Extreme blur where the underlying content is essentially destroyed will be hallucinated rather than faithfully reconstructed.
Control Signal Requirements
- Control signal type: Spatial defocus blur (the degradation the model inverts).
- Expected input: A single video clip β the blurry footage β supplied as the IC-LoRA reference.
- Preprocessing: None. Feed the blurry video directly; no extractor, mask, or normalization is required.
- Alignment: The reference drives content directly. Best results when the reference is run through the standard IC-LoRA pipeline at the trained bucket (960Γ544, 121 frames @ 24 fps); the production pipeline handles res/length bucketing.
- Mask support: Not supported β the effect is applied to the whole frame.
How It Works
The IC-LoRA conditions on the reference (blurry) video's latents together with a dual-panel "DEBLUR" prompt that describes the scene and asks for the same scene in sharp focus. Because the reference stays attached for the entire denoise (stage-1-only inference, see Usage), subject identity, framing, and background geometry are preserved while focus and sharpness are restored. The trained convention is a two-part caption:
Reference shows <scene description>, heavily out of focus with soft defocused blur and no fine detail. Edited shows the same scene in sharp focus with crisp detail and clean edges. DEBLUR <scene description> Subject identity, framing, and background geometry are identical to the reference; only focus and sharpness differ between reference and edited.
Usage
π ComfyUI
- Copy
ltx-2.3-22b-ic-lora-deblur-0.9.safetensorsintomodels/loras. - Load the LTX-2.3-22B base model and add the LoRA.
- Use an IC-LoRA (video-to-video) workflow from the LTX-2 ComfyUI repository, which wires the reference/guide nodes correctly. Connect the blurry clip as the reference/control video.
- Start at LoRA strength
1.0and lower toward0.8if the output over-sharpens (haloing/ringing).
Production pipeline (recommended)
Evaluate and ship with the IC-LoRA video-to-video pipeline (python -m ltx_pipelines.ic_lora) using the identity-safe stage-1-only native hi-res recipe: it renders on a 2Γ canvas and decodes the half-canvas as the final 1920Γ1088, keeping the reference attached for the whole denoise so both identity and sharpness hold at full resolution. The dev-trained LoRA loads cleanly onto the ltx-2.3-22b-distilled-1.1 inference base. Avoid the trainer's basic scripts/inference.py for production output, and avoid the two-stage path for identity-critical clips.
Recommended Settings
- LoRA strength / weight:
1.0(sweep0.5β1.0if it over-modifies β oversaturation, baked-in artifacts, or haloing). - Resolution & frames: Trained at 960Γ544 (landscape and portrait), 121 frames @ 24 fps; generates well at native 1920Γ1088 via the stage-1-only pipeline.
- Prompting: Follow the trained
DEBLURdual-panel convention above. The reference video does most of the work; the prompt mainly anchors the scene and the "sharp focus, crisp detail, clean edges" intent. - Suggested negative prompt:
worst quality, blurry, out of focus, defocused, soft, hazy, smeared, low detail, jittery, distorted, oversharpened, haloing, ringing(used during training validation; note the production distilled pipeline does not take a negative prompt).
References
- Code: GitHub Repository
- ComfyUI: ComfyUI-LTXVideo
- IC-LoRA docs: IC-LoRA usage guide
Tips & Troubleshooting
- Over-sharpening / ringing or halos: lower
--lora-strengthtoward 0.8. - Effect looks weak at 1080p: lower the native generation resolution (e.g.
--width 1536 --height 896) closer to the training bucket. - Identity drift at high res: use the stage-1-only default rather than the two-stage path β stage 2 has no reference anchor and drifts on identity-critical content.
- Motion blur not removed: expected β the model was trained only on spatial defocus, not temporal/motion blur.
Dataset
The model was trained on a proprietary dataset of 500 (blurry β sharp) video pairs built specifically for in-context deblur training (details below).
Dataset construction (v2)
Motivation. The v1 deblur dataset applied a single degradation recipe to every clip (boxblur + a light gblur). That cheap disc-defocus look was too synthetic β the LoRA learned to invert that specific filter rather than real optical blur and generalized poorly to genuine out-of-focus footage. v2 spans three blur families at varied strengths so the model sees the full "blurry β sharp" distribution it will be asked to invert.
Pairs are built for IC-LoRA training as:
- target (
videos/): the sharp original clip - reference (
references/): the same clip degraded with one blur style + strength
Source footage. 5-second clips at native resolution β a deliberate mix of 4K and 1080p, landscape and portrait (kept native; the trainer's resolution bucketing handles downscaling). 395 clips reused from an existing stock pool plus 150 new Pexels clips across 8 themes (city, nature, ocean, people, food, wildlife, portraits, night), deduplicated, trimmed to exactly 5 s (libx264 -crf 18, audio stripped, yuv420p). Combined into a 545-clip pool; the build draws 500.
Composition (500 clips).
| Style | Count | Degradation |
|---|---|---|
box |
150 | boxblur=lr=L,gblur=sigma=1 β flat disc defocus (the v1 look, retained for coverage) |
gauss |
150 | gblur=sigma=S β plain gaussian blur |
disk |
200 | Physically realistic lens defocus (largest share β highest fidelity) |
Within each style, clips are split evenly across four strength tiers (light / medium / heavy / extreme).
Resolution-scaled strength. A fixed pixel radius blurs a 4K frame far less (perceptually) than a 1080p one. Every strength is anchored at a 1080p long edge (1920 px) and scaled per clip by long_edge / 1920, so a light 4K clip gets ~2Γ the pixel radius of a light 1080p clip and the two look perceptually equivalent.
| Tier | box boxblur lr |
gauss sigma |
disk radius (px) |
|---|---|---|---|
| light | 6 | 3 | 8 |
| medium | 12 | 6 | 14 |
| heavy | 20 | 10 | 20 |
| extreme | 30 | 16 | 28 |
Realistic disk defocus. Rather than an ffmpeg filter, each frame is convolved with a uniform circular kernel (the optical circle of confusion): cropdetect excludes letterbox/pillarbox bars so they don't smear in; convolution is done in linear light (sRGB β linear β convolve β sRGB) to avoid muddy gamma-space blur; BORDER_REPLICATE avoids dark edge halos; and it is purely spatial (no temporal blur, so no motion ghost-trails). Frames are streamed raw (bgr24) out of ffmpeg, processed with NumPy/OpenCV, and piped back into libx264, preserving source resolution, frame rate, and frame count.
Reproducibility & parity. A seeded RNG (seed 42) shuffles the pool, partitions it into per-style counts (150/150/200), and assigns the four strength tiers round-robin within each style, recording every assignment to recipes.json (resumable). All references encode with libx264 -crf 18 -preset slow -pix_fmt yuv420p, audio stripped, preserving source resolution/frame-count/pixel-format. A verification pass confirmed 500/500 pairs valid, 0 mismatches. Captions are generated on the training machine (visual-only) and merged into dataset.json before training. After preprocessing, 1 clip was filtered for insufficient frames, leaving 499 valid training pairs.
Training
- Technique: IC-LoRA (rank 128, alpha 128, dropout 0.05) on the DiT transformer, targeting
attn1(self-attentionto_q/k/v/out.0) + FFN (ff.net.0.proj,ff.net.2); cross-attention (attn2) intentionally not targeted. - Hyperparameters: bf16 mixed precision, AdamW, learning rate 1.5e-4, cosine schedule,
max_grad_norm1.0, gradient checkpointing on,first_frame_conditioning_p0.15, shifted-logit-normal flow-matching timestep sampling. - Resolution / data: preprocessed at 960Γ544 (landscape + portrait, 121/97/89-frame buckets), 499 valid (blurryβsharp) pairs.
- Steps: planned 1500, stopped at step 1000 (recommended checkpoint); checkpoints saved every 100 steps.
- Infrastructure: LTX-2 Community Trainer, DDP across 8Γ NVIDIA H100.
License
See the LTX-2-community-license for full terms.
Acknowledgments
- Base model by Lightricks
- Training infrastructure: LTX-2 Community Trainer
Model tree for Lightricks/LTX-2.3-22b-IC-LoRA-Deblur
Base model
Lightricks/LTX-2.3