You need to agree to share your contact information to access this model

By clicking "Agree and Access" you acknowledge the Privacy Policy and consent to receive offers and updates. You can unsubscribe at any time.

Log in or Sign Up to review the conditions and access this model content.

LTX-2.3 22B IC-LoRA Deblur (v2)

This is a Deblur IC-LoRA trained on top of LTX-2.3-22B, which restores sharpness to out-of-focus / defocused video by conditioning on the blurry clip and regenerating it in sharp focus while preserving the original subject, framing, and scene geometry.

It is based on the LTX-2.3 foundation model.

Model Files

ltx-2.3-22b-ic-lora-deblur-0.9.safetensors

The shipped checkpoint is step 1000. This run was planned for 1500 steps and stopped early at 1000; the quality sweet spot is in the ~800–1000 range, and step 1000 is the recommended default. Earlier checkpoints (steps 100–900) are available from the training run if you want to trade restoration strength for a gentler effect.

Model Details

  • Base Model: LTX-2.3-22B Video
  • Training Type: IC-LoRA (video-to-video, paired referenceβ†’target)
  • Control Type: Defocus/out-of-focus blur β€” the model conditions on a blurry reference video and outputs the sharp version
  • Reference Downscale Factor: 1 (the reference is processed at the same resolution as the output)
  • Pipeline details: No special pre/post color transform. Reference (blurry) and target (sharp) share identical content; only focus/sharpness differ.

Intended Use & Out-of-Scope

Intended use: Recovering sharpness from genuinely out-of-focus or softly defocused footage β€” landscape and portrait, mixed real-world content (people, wildlife, nature, cities, food, night). Designed to be driven by the production IC-LoRA video-to-video inference pipeline at native 1080p.

Out of scope: Motion-blur removal (the dataset contains no temporal/motion blur), heavy compression-artifact repair, denoising, or super-resolution of already-sharp footage. Extreme blur where the underlying content is essentially destroyed will be hallucinated rather than faithfully reconstructed.

Control Signal Requirements

  • Control signal type: Spatial defocus blur (the degradation the model inverts).
  • Expected input: A single video clip β€” the blurry footage β€” supplied as the IC-LoRA reference.
  • Preprocessing: None. Feed the blurry video directly; no extractor, mask, or normalization is required.
  • Alignment: The reference drives content directly. Best results when the reference is run through the standard IC-LoRA pipeline at the trained bucket (960Γ—544, 121 frames @ 24 fps); the production pipeline handles res/length bucketing.
  • Mask support: Not supported β€” the effect is applied to the whole frame.

How It Works

The IC-LoRA conditions on the reference (blurry) video's latents together with a dual-panel "DEBLUR" prompt that describes the scene and asks for the same scene in sharp focus. Because the reference stays attached for the entire denoise (stage-1-only inference, see Usage), subject identity, framing, and background geometry are preserved while focus and sharpness are restored. The trained convention is a two-part caption:

Reference shows <scene description>, heavily out of focus with soft defocused blur and no fine detail. Edited shows the same scene in sharp focus with crisp detail and clean edges. DEBLUR <scene description> Subject identity, framing, and background geometry are identical to the reference; only focus and sharpness differ between reference and edited.

Usage

πŸ”Œ ComfyUI

  1. Copy ltx-2.3-22b-ic-lora-deblur-0.9.safetensors into models/loras.
  2. Load the LTX-2.3-22B base model and add the LoRA.
  3. Use an IC-LoRA (video-to-video) workflow from the LTX-2 ComfyUI repository, which wires the reference/guide nodes correctly. Connect the blurry clip as the reference/control video.
  4. Start at LoRA strength 1.0 and lower toward 0.8 if the output over-sharpens (haloing/ringing).

Production pipeline (recommended)

Evaluate and ship with the IC-LoRA video-to-video pipeline (python -m ltx_pipelines.ic_lora) using the identity-safe stage-1-only native hi-res recipe: it renders on a 2Γ— canvas and decodes the half-canvas as the final 1920Γ—1088, keeping the reference attached for the whole denoise so both identity and sharpness hold at full resolution. The dev-trained LoRA loads cleanly onto the ltx-2.3-22b-distilled-1.1 inference base. Avoid the trainer's basic scripts/inference.py for production output, and avoid the two-stage path for identity-critical clips.

Recommended Settings

  • LoRA strength / weight: 1.0 (sweep 0.5–1.0 if it over-modifies β€” oversaturation, baked-in artifacts, or haloing).
  • Resolution & frames: Trained at 960Γ—544 (landscape and portrait), 121 frames @ 24 fps; generates well at native 1920Γ—1088 via the stage-1-only pipeline.
  • Prompting: Follow the trained DEBLUR dual-panel convention above. The reference video does most of the work; the prompt mainly anchors the scene and the "sharp focus, crisp detail, clean edges" intent.
  • Suggested negative prompt: worst quality, blurry, out of focus, defocused, soft, hazy, smeared, low detail, jittery, distorted, oversharpened, haloing, ringing (used during training validation; note the production distilled pipeline does not take a negative prompt).

References

Tips & Troubleshooting

  • Over-sharpening / ringing or halos: lower --lora-strength toward 0.8.
  • Effect looks weak at 1080p: lower the native generation resolution (e.g. --width 1536 --height 896) closer to the training bucket.
  • Identity drift at high res: use the stage-1-only default rather than the two-stage path β€” stage 2 has no reference anchor and drifts on identity-critical content.
  • Motion blur not removed: expected β€” the model was trained only on spatial defocus, not temporal/motion blur.

Dataset

The model was trained on a proprietary dataset of 500 (blurry β†’ sharp) video pairs built specifically for in-context deblur training (details below).

Dataset construction (v2)

Motivation. The v1 deblur dataset applied a single degradation recipe to every clip (boxblur + a light gblur). That cheap disc-defocus look was too synthetic β€” the LoRA learned to invert that specific filter rather than real optical blur and generalized poorly to genuine out-of-focus footage. v2 spans three blur families at varied strengths so the model sees the full "blurry β†’ sharp" distribution it will be asked to invert.

Pairs are built for IC-LoRA training as:

  • target (videos/): the sharp original clip
  • reference (references/): the same clip degraded with one blur style + strength

Source footage. 5-second clips at native resolution β€” a deliberate mix of 4K and 1080p, landscape and portrait (kept native; the trainer's resolution bucketing handles downscaling). 395 clips reused from an existing stock pool plus 150 new Pexels clips across 8 themes (city, nature, ocean, people, food, wildlife, portraits, night), deduplicated, trimmed to exactly 5 s (libx264 -crf 18, audio stripped, yuv420p). Combined into a 545-clip pool; the build draws 500.

Composition (500 clips).

Style Count Degradation
box 150 boxblur=lr=L,gblur=sigma=1 β€” flat disc defocus (the v1 look, retained for coverage)
gauss 150 gblur=sigma=S β€” plain gaussian blur
disk 200 Physically realistic lens defocus (largest share β€” highest fidelity)

Within each style, clips are split evenly across four strength tiers (light / medium / heavy / extreme).

Resolution-scaled strength. A fixed pixel radius blurs a 4K frame far less (perceptually) than a 1080p one. Every strength is anchored at a 1080p long edge (1920 px) and scaled per clip by long_edge / 1920, so a light 4K clip gets ~2Γ— the pixel radius of a light 1080p clip and the two look perceptually equivalent.

Tier box boxblur lr gauss sigma disk radius (px)
light 6 3 8
medium 12 6 14
heavy 20 10 20
extreme 30 16 28

Realistic disk defocus. Rather than an ffmpeg filter, each frame is convolved with a uniform circular kernel (the optical circle of confusion): cropdetect excludes letterbox/pillarbox bars so they don't smear in; convolution is done in linear light (sRGB β†’ linear β†’ convolve β†’ sRGB) to avoid muddy gamma-space blur; BORDER_REPLICATE avoids dark edge halos; and it is purely spatial (no temporal blur, so no motion ghost-trails). Frames are streamed raw (bgr24) out of ffmpeg, processed with NumPy/OpenCV, and piped back into libx264, preserving source resolution, frame rate, and frame count.

Reproducibility & parity. A seeded RNG (seed 42) shuffles the pool, partitions it into per-style counts (150/150/200), and assigns the four strength tiers round-robin within each style, recording every assignment to recipes.json (resumable). All references encode with libx264 -crf 18 -preset slow -pix_fmt yuv420p, audio stripped, preserving source resolution/frame-count/pixel-format. A verification pass confirmed 500/500 pairs valid, 0 mismatches. Captions are generated on the training machine (visual-only) and merged into dataset.json before training. After preprocessing, 1 clip was filtered for insufficient frames, leaving 499 valid training pairs.

Training

  • Technique: IC-LoRA (rank 128, alpha 128, dropout 0.05) on the DiT transformer, targeting attn1 (self-attention to_q/k/v/out.0) + FFN (ff.net.0.proj, ff.net.2); cross-attention (attn2) intentionally not targeted.
  • Hyperparameters: bf16 mixed precision, AdamW, learning rate 1.5e-4, cosine schedule, max_grad_norm 1.0, gradient checkpointing on, first_frame_conditioning_p 0.15, shifted-logit-normal flow-matching timestep sampling.
  • Resolution / data: preprocessed at 960Γ—544 (landscape + portrait, 121/97/89-frame buckets), 499 valid (blurryβ†’sharp) pairs.
  • Steps: planned 1500, stopped at step 1000 (recommended checkpoint); checkpoints saved every 100 steps.
  • Infrastructure: LTX-2 Community Trainer, DDP across 8Γ— NVIDIA H100.

License

See the LTX-2-community-license for full terms.

Acknowledgments

  • Base model by Lightricks
  • Training infrastructure: LTX-2 Community Trainer
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Lightricks/LTX-2.3-22b-IC-LoRA-Deblur

Adapter
(67)
this model

Spaces using Lightricks/LTX-2.3-22b-IC-LoRA-Deblur 2

Collection including Lightricks/LTX-2.3-22b-IC-LoRA-Deblur