You need to agree to share your contact information to access this model

By clicking "Agree and Access" you acknowledge the Privacy Policy and consent to receive offers and updates. You can unsubscribe at any time.

LTX-2.3 22B IC-LoRA Colorization

This is a Colorization IC-LoRA trained on top of LTX-2.3-22B, which restores natural color to grayscale, monochrome, or desaturated video while keeping subject identity, framing, and geometry untouched — only the color information changes.

It is based on the LTX-2.3 foundation model.

Model Files

ltx-2.3-22b-ic-lora-colorization-0.9.safetensors — the single released checkpoint (used for the published inference samples).

Model Details

Base Model: LTX-2.3-22B Video
Training Type: IC-LoRA (video-to-video)
Control Type: Video-to-video — a grayscale/monochrome input (reference) video drives a color-restored output video.
Reference Downscale Factor: 1 (the reference is encoded at 1× the output resolution).
Pipeline details: No special pre/post color transform — the reference video is VAE-encoded as the control signal and the model predicts the colorized result.

Intended Use & Out-of-Scope

Intended use: Colorizing black-and-white, monochrome, or heavily desaturated footage — restoring plausible, natural color to a clip while preserving the original subject, composition, motion, and background geometry.

Out of scope: Any task other than colorization (it is not a deblur, denoise, decompression, or upscaling model); relighting or scene re-composition; and content far outside the training distribution. Generating far above the 960×544 training bucket can weaken the colorization effect.

Control Signal Requirements

Control signal type: Grayscale / monochrome / desaturated source video.
Expected input: A reference video (.mp4 / .mov / .mkv / .webm / .avi).
Preprocessing: None required — the reference is VAE-encoded directly. The reference is used at 1× the output resolution (downscale factor 1).
Alignment: The output matches the reference frame count, FPS, resolution, and aspect ratio. Best results at the 960×544×121 @ 24fps training bucket (both landscape 960×544 and portrait 544×960 were seen in training).

How It Works

The model is conditioned on both the reference video latents and a text prompt that describes the grayscale source and the desired color result. The prompt convention learned in training is:

Reference shows {grayscale scene description}. Edited shows the same scene with natural colors restored. COLORIZE {vivid natural-color description of the same scene}. Subject identity, framing, and background geometry are identical to the reference; only color information differs between reference and edited.

Representative prompt from a real run:

Reference shows a small wild rabbit sitting among rough textured boulders with a fallen log and dry grass behind it, rendered in high-contrast monochrome with soft natural daylight emphasizing the fine fur and the coarse stone surfaces. Edited shows the same scene with natural colors restored. COLORIZE a young brown cottontail rabbit with warm tan and grey-brown fur, a pale cream underside and soft pink inner ears, perched on weathered grey granite boulders flecked with green and ochre lichen. Behind it a bleached driftwood log and clumps of golden dry grass catch the warm late-afternoon sun, while muted green vegetation softens the blurred background. The light is gentle and warm, giving the rocks subtle earthy browns and the whole scene a calm woodland tone. Subject identity, framing, and background geometry are identical to the reference; only color information differs between reference and edited.

Usage

🔌 ComfyUI

Copy the LoRA weights into models/loras.
Load the LTX-2.3-22B base model and add ltx-2.3-22b-ic-lora-colorization-0.9.safetensors as the LoRA.
Start at strength 1.0 and adjust to taste.
Use an IC-LoRA (video-to-video) workflow from the LTX-2 ComfyUI repository, which already wires the reference-video control nodes. Connect your grayscale clip as the reference video and write the prompt using the COLORIZE convention above. Because the reference downscale factor is 1, a generic reference encode at output resolution is correct.
Start at or near the 960×544 training bucket; generating far above it can weaken colorization on high-frequency detail.

Recommended Settings

LoRA strength / weight: 1.0 (the published samples used strength 1.0).
Resolution & frames: Trained and validated at 960×544×121 @ 24fps (frames satisfy (frames-1) % 8 == 0). Both landscape (960×544) and portrait (544×960) were in training. Start near this bucket for the strongest, most consistent effect.
Prompting: Follow the Reference shows … COLORIZE … only color information differs structure documented in How It Works. Describe the same scene in both halves; only change the color description. Keep identity/framing/geometry language intact so the model only alters color.
Production inference recipe (what we used): Run via the distilled ltx_pipelines.ic_lora pipeline with the identity-safe, stage-1-only native hi-res recipe — render on a 2× canvas with --skip-stage-2 --tile-reference-encode (stage 2 is skipped so the reference stays anchored for the whole denoise), LoRA strength 1.0, seed 42, 121 frames @ 24fps. The distilled checkpoint uses fixed sigmas, so there is no CFG / guidance scale and no negative prompt. A dev-trained LoRA loads cleanly on the distilled checkpoint.

Examples

References

Code: GitHub Repository
Inference Pipeline: ltx_pipelines.ic_lora (LTX-2 distilled IC-LoRA pipeline)

Tips & Troubleshooting

Weak or partial colorization at very high resolution: the model generates stage-1 at the full output resolution, which is well above the 960×544 training bucket. If color looks washed out or incomplete, lower the generation/output resolution toward the training bucket.
Color bleed or oversaturation: drop the LoRA strength slightly (e.g. 0.8–0.9) and make the COLORIZE description more specific about the intended hues.
Identity drift: keep the stage-1-only recipe (do not use the two-stage path for identity-critical clips) so the reference stays attached for the entire denoise.

Dataset

The model was trained using a proprietary dataset.

Training

Technique: IC-LoRA (rank 128, alpha 128, dropout 0.05) on the DiT transformer; target modules attn1.to_q, attn1.to_k, attn1.to_v, attn1.to_out.0, ff.net.0.proj, ff.net.2.
Hyperparameters: bf16 mixed precision, AdamW optimizer, learning rate 1.5e-4, cosine scheduler, gradient checkpointing on, batch size 1.
Steps: 1500 total training steps configured. Released checkpoint: ltx-2.3-22b-ic-lora-colorization-0.9.safetensors.
Infrastructure: LTX-2 Community Trainer (single-node, 8 GPU).