LTX-Video 2.3 IC-LoRA: Dual-Character (English mirror)

An English-mirrored, field-tested In-Context LoRA for Lightricks/LTX-2.3 (22B distilled), tuned for two-character dialogue scenes and multi-shot cinematic video generation.


Example renders

Episode is an 8-shot Chinese palace drama (《玉佩定情》 + 《暗夜阴谋》) with three characters: 沈月华 (Shen Yuehua, heroine), 萧云霄 (Xiao Yunxiao, prince), 慕容静 (Murong Jing, antagonist). Render config: 1280×704, 121 frames @ 24 fps, ambient audio.

Single-character identity — Shen Yuehua walking in the garden, picks up a jade pendant

Dual-character dialogue — Shen + Xiao meet (the LoRA's signature use case)

Cross-scene identity — Murong Jing in a different location (palace night chamber)

Three-character composition — the LoRA's upper limit


What this LoRA does

An In-Context LoRA (IC-LoRA) trained on top of Lightricks/LTX-2.3 (22B distilled), specifically tuned for:

  1. Two-character dialogue scenes — significantly reduces character drift when two people appear in the same frame
  2. Cinematic shot composition — reinforced for dialogue-driven framing (close-up ↔ medium ↔ wide)
  3. Multi-shot narrative continuity — better understanding of multi-segment prompts (storyboard-style descriptions)
  4. Style compatibility — works well across 古风仙侠 (ancient Chinese fantasy), 现代都市 (modern urban), and 3D 动漫 styles

This is an IC-LoRA (in-context LoRA), so it expects reference images to be passed through the parallel-canvas conditioning mechanism, NOT as pixel-pinned frames. See the Lightricks ltx_pipelines.ic_lora.ICLoraPipeline for the upstream pipeline.


Model card

Field Value
Base model Lightricks/LTX-2.3 (22B distilled)
LoRA type IC-LoRA (video-to-video conditioning)
File LTX2.3-IC-LORA-Dual-Character.safetensors (~313 MB)
License Apache 2.0
Trigger word None — no special token required

Field-tested production usage

The notes below are from running this LoRA in production as part of a multi-shot Chinese drama video generation pipeline. They go beyond what's in the original model card.

Strength

  • Standalone: 0.7–0.9 works well
  • When stacking with other LoRAs: drop to 0.3–0.5 to stay under the typical 1.5 over-baking ceiling

Resolution

  • Recommended: 1280×704 (16:9, native LTX-2.3 distilled training resolution)
  • Faster preview: 960×544 (~40% faster, slightly less detail)
  • Avoid portrait (9:16) — this LoRA was trained on landscape; identity quality degrades noticeably in portrait orientation
  • Width and height must each be divisible by 64

Number of frames

LTX-2.3 requires num_frames to satisfy 8k + 1 (e.g., 121, 145, 193, 241, 361). At 24 fps:

  • 5 s shot = 121 frames
  • 8 s shot = 193 frames
  • 15 s shot = 361 frames

Prompt structure that works well

Use a 3-block structure: [场景] / [角色] / [镜头与情节].

[场景] 古风皇宫御花园桃花径,午后金色阳光透过盛开桃花斜射,
粉色花瓣随风飘落,朱红宫墙翠竹环绕。

[角色] 沈月华:年轻女子,长黑发半扎绿玉簪,鬓边一朵小白花,
柔和圆润大眼,肤色白皙。身穿浅蓝色丝绸汉服宫装,白色云鹤刺绣。
萧云霄:年轻男子,黑发束起金冠玉饰,剑眉星目。身穿深红色丝绸金线龙纹宫袍。

[镜头与情节] 中景两人画面。沈月华手中持翠绿玉佩,萧云霄从右侧朱红
宫墙转角缓步走出停下,拱手轻施一礼,目光温和注视沈月华手中玉佩。
电影级布光,浅景深虚化,35mm 双人中景,温暖色调。

Production tips earned the hard way

These are quirks of this LoRA + the LTX-2.3 distilled backbone that aren't documented in the original model card but matter in practice:

1. Static-image reference: use a SHORT video wrap (≤ 8 frames)

If you wrap a single PNG character ref into a video for IC-LoRA conditioning, use 8 frames (≈ 0.33 s) — NOT 30 frames. Longer static wraps cause a "first second stuck on the ref image" beat at the start of every clip. The IC-LoRA's per-frame matching dominates motion onset when the static wrap is too long.

2. Repeat color tokens for dark-clothed characters

This LoRA has a light-wuxia-robe bias. Dark outfits drift toward white at low ref-image-strength. Recipe: repeat the color token glued to each clothing noun:

BAD:  black fedora and black suit
GOOD: BLACK fedora, white shirt, BLACK suit jacket, BLACK trousers,
      ... BLACK suit, BLACK trousers throughout

Also bump ref_image_strength to 0.55 (action) or 0.85 (medium-slow) for color fidelity.

3. Never use quoted dialogue in prompts

This LoRA was trained on Chinese drama clips with burned-in Chinese subtitles. Any quoted dialogue (「…」 or "…") in the prompt causes the LoRA to hallucinate subtitle characters at the bottom of the frame. This is the single biggest gotcha.

BAD:  低声警告 「此茶不可饮!」    ← produces fake on-screen subtitles
GOOD: 低声急切警告她茶水有毒        ← clean output, indirect narration

If your application needs actual subtitles, burn them post-hoc via ffmpeg drawtext, not via the prompt.

4. Avoid "object detaches" prompts during action

At high motion intensity (cas/ris ≤ 0.40), the model loses object tracking. A directive like "fedora flies off mid-spin and tumbles to the floor" produces broken output — the hat dematerialises. Either:

  • Keep the object attached and say so explicitly ("the fedora STAYS ON his head throughout the spin")
  • Or render attach + detach as two clips and concat

5. Cross-shot identity drift

For multi-shot dialogue scenes, character identity drifts across cuts. Workaround: chain shots by passing a 12-frame tail clip of shot N as ref_videos=[(tail.mp4, 0.7)] for shot N+1. Significantly improves continuity.

Render performance

  • Resolution: 1280×704, 121 frames @ 24 fps (~5 s output)
  • Hardware: NVIDIA A800 80 GB
  • Time: ~70 s per shot (8-step distilled + 3-step spatial upscaler + audio decode)
  • Output: mp4 with ambient audio track (no TTS)

On consumer hardware (RTX 4090 24 GB), expect ~3–4 minutes per shot due to memory pressure from the 22B model.


Limitations

From the original author + our field testing:

  1. Subtitle hallucination with quoted dialogue (see tip #3 above)
  2. Complex physical interactions (wrestling, hugging, intricate hand-on-hand) can deform
  3. Tail-frame artifact of LTX-2.3 — last 6–8 frames may smear; trim post-hoc if needed
  4. Action complexity ceiling — the 8-step distilled budget caps motion complexity at action peaks
  5. Portrait orientation degrades identity (LoRA trained on landscape only)

Original Chinese README (preserved)

The original Chinese model card from ModelScope is reproduced below for users who want the unmodified original documentation.

点击展开原版中文模型卡片 (click to expand original Chinese README)

LTX-Video (2.3) IC-LoRA: 双人分镜头对话增强模型

本模型是基于 Lightricks LTX-2.3 底模训练的 IC-LoRA,专为双人同框对话、角色互动及分镜头视频生成场景深度优化。

一、模型核心提升

  1. 角色参考稳定性:显著提升双人同框时的人物特征一致性,减少角色漂移。
  2. 分镜构图稳定性:针对影视化对话构图进行了加固,支持更精准的镜头控制。
  3. 叙事连贯性:增强了对多段描述的理解力,使分镜间的过渡衔接更自然。
  4. 风格兼容性:完美支持古风仙侠、现代都市、3D 动漫等主流视觉风格。

二、模型基本信息

  1. 基础模型:Lightricks/LTX-2.3
  2. 许可证:Apache-2.0
  3. 管道标签:image-to-video, text-to-video
  4. 模型用途:仅供学习交流使用
  5. 开发者:麻雀 AI

三、运行指南

  1. 推荐平台:ComfyUI
  2. 支持工作流:ComfyUI 官方 LTX 工作流、KJ-LTX 插件工作流
  3. 生成模式:文生视频 (T2V) 与 图生视频 (I2V) 均支持
  4. 硬件参考:RTX 5090 显卡在 720P 分辨率下,单条视频生成耗时约 2 分钟

四、推荐参数配置

  1. 分辨率:建议使用 16:9 (如 1280x720)
  2. 时长与帧率:建议时长 ≥10 秒,帧率设定为 24 FPS
  3. LoRA 权重设定:
    • 独立使用建议:0.6 - 1.0
    • 叠加其他 LoRA 使用时建议:0.3 - 0.5

五、Prompt 编写规范

  1. 编写逻辑:需包含完整的场景描述 + 角色设定 + 分镜设计 + 镜头语言,强化双人对话互动逻辑。
  2. 触发词说明:无需特定触发词。

六、效果说明与局限性

  1. 优势风格:在古风、现代、3D 动漫类双人对话场景中表现最佳。
  2. 已知限制:受限于 LTX-2.3 底模性能,极其复杂的双人肢体互动(如缠绕、打斗)可能出现形变。
  3. 运动幅度:建议以对话和微动作为主,大动态动作的连贯性仍有提升空间。

How to use

With the upstream Lightricks pipeline

from ltx_pipelines.ic_lora import ICLoraPipeline
from ltx_core.loader import LoraPathStrengthAndSDOps
from ltx_core.loader import sd_ops as _sd_ops_mod
import torch

# Use the IC-LoRA's standard SDOps mapping
lora = LoraPathStrengthAndSDOps(
    "LTX2.3-IC-LORA-Dual-Character.safetensors",
    0.8,                                      # strength (standalone)
    _sd_ops_mod.LTXV_LORA_COMFY_RENAMING_MAP,
)

pipe = ICLoraPipeline(
    distilled_checkpoint_path="ltx-2.3-22b-distilled-1.1.safetensors",
    spatial_upsampler_path="ltx-2.3-spatial-upscaler-x2-1.1.safetensors",
    gemma_root="google/gemma-3-12b-it-qat-q4_0-unquantized",
    loras=[lora],
    device=torch.device("cuda:0"),
)

video, audio = pipe(
    prompt="...",                # your structured 3-block prompt
    seed=42,
    height=704, width=1280,
    num_frames=121,              # 5 s @ 24 fps, satisfies 8k+1
    frame_rate=24,
    video_conditioning=[("char_ref.mp4", 0.85)],   # 8-frame static wrap of the character portrait
    enhance_prompt=False,
    conditioning_attention_strength=0.85,
)

Hardware requirements

GPU VRAM Works?
A100 / A800 80 GB 80 GB ✅ ~70 s per 5 s shot
RTX 4090 / 3090 24 GB ✅ ~3–4 min per 5 s shot
RTX 4080 / 4070 Ti Super 16 GB ❌ won't fit 22B in bf16
anything < 24 GB ❌ no

Acknowledgements


Source attribution

⚠️ This is an English-language mirror of fxj1131's LTX2.3 IC-LoRA Dual-Character on ModelScope. All credit for the model weights belongs to the original author, 麻雀 AI (Maque AI). This mirror exists to make the model + documentation accessible to HuggingFace users who cannot easily access ModelScope, and to share field-tested usage notes from a production deployment. The .safetensors weights file is unmodified and byte-identical to the ModelScope upload.


License

Apache License 2.0 — same as the original. See LICENSE and NOTICE.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SyFeee/LTX2.3-Dual-Character-en

Adapter
(48)
this model