Instructions to use SyFeee/ltx2.3-chinese-drama-charlora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use SyFeee/ltx2.3-chinese-drama-charlora with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Lightricks/LTX-Video", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("SyFeee/ltx2.3-chinese-drama-charlora") prompt = "A man with short gray hair plays a red electric guitar." output = pipe(prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
LTX-2.3 Chinese Drama Character LoRA
A character LoRA for LTX-Video 2.3 (22B) trained on a 78-episode corpus of Chinese historical drama footage. Specialises the base model for live-action, photoreal cinematic generation in Han dynasty / wuxia settings, with stable identity for the most-frequent on-screen characters.
Model details
| Field | Value |
|---|---|
| Base model | Lightricks/LTX-2.3-22B |
| Adapter type | Plain LoRA (PEFT-style) |
| Rank | 64 |
| Alpha | 64 |
| Target modules | to_k, to_q, to_v, to_out.0 |
| Training steps | 3000 |
| Optimizer | AdamW |
| Learning rate | 1e-4, linear schedule |
| Mixed precision | bf16 |
| Gradient checkpointing | enabled |
Training data
- Source: 78 episodes of a Chinese historical drama (Han dynasty setting), processed at extraction time into ~750 clean shots.
- Caption format: enriched inline-weave prompts produced by Qwen3-Omni VL, with character identity triggers (
char_0_person,char_1_person, ...) substituted at the start of each prompt, followed by camera/scene prose and a style anchor (live-action photorealistic, cinematic Chinese drama). - Captions are bilingual (English + Simplified Chinese). The deployed LoRA was trained on the Chinese variants.
Usage
The character triggers char_0_person, char_1_person, etc. correspond to the most-frequent identity clusters discovered by ArcFace + DBSCAN over the corpus. They must appear at the start of the prompt, comma-separated, terminated with a period.
Prompt format
char_0_person, char_1_person. Framed in a static eye level medium shot,
on a 35mm normal lens, with natural light. Set in a torch-lit Han dynasty
courtyard at dusk, the subjects face each other in tense silence.
Live-action photorealistic, cinematic Chinese drama.
Loading with diffusers
from diffusers import LTXPipeline
import torch
pipe = LTXPipeline.from_pretrained(
"Lightricks/LTX-2.3-22B",
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
# Load the LoRA
pipe.load_lora_weights(
"SyFeee/ltx2.3-chinese-drama-charlora",
weight_name="lora_weights_step_03000.safetensors",
adapter_name="cn_drama_char",
)
pipe.set_adapters(["cn_drama_char"], adapter_weights=[0.9])
video = pipe(
prompt=(
"char_0_person. Framed in a close-up on a 50mm normal lens, with shallow focus. "
"Set in a candlelit Han dynasty study, the subject sits writing on bamboo scrolls. "
"Live-action photorealistic, cinematic Chinese drama."
),
negative_prompt="no CGI, no animation, no illustration, no painterly style, no anime",
width=1280,
height=544,
num_frames=89,
guidance_scale=4.0,
num_inference_steps=20,
).frames[0]
Recommended LoRA strength
0.8β subtle stylistic touch, identity present but soft0.9β default, validated against training distribution1.0+β risks overfitting on facial features at the expense of camera/scene freedom
What this LoRA does well
- Live-action photoreal output in the Han dynasty visual register (palace interiors, courtyards, traditional costume).
- Stable character identity across cuts when the same trigger is used.
- Strong response to camera/lens prose conventions baked into the training captions (
dolly in,35mm normal lens,handheld).
What it does NOT do
- It cannot generate characters it never saw in training (no zero-shot identity transfer).
- It is not an audio model. For audio-conditioned generation see the matching AV LoRA in the same project.
- It does not include the IC-LoRA control adapters (pose / depth / canny). Those are shipped as separate repositories.
Triggers in the trained corpus
The trigger naming follows the fallback format char_{id}_person. Cluster frequencies in the training corpus:
| Trigger | Clips |
|---|---|
char_0_person |
442 |
char_0_person, char_1_person |
180 |
char_0_person, char_1_person, char_2_person |
53 |
| 4β10 character combinations | 49 |
| No characters detected (style-only) | 20 |
So the model has the strongest signal for single-character and two-character compositions.
Related models
SyFeee/ltx2.3-chinese-drama-iclora-poseβ pose-controlled IC-LoRA on the same corpus.SyFeee/ltx2.3-chinese-drama-iclora-depthβ depth-controlled IC-LoRA.SyFeee/ltx2.3-chinese-drama-iclora-cannyβ canny-edge-controlled IC-LoRA.
License
Apache 2.0. See LICENSE for terms.
Attribution: SyFe.
- Downloads last month
- 41
Model tree for SyFeee/ltx2.3-chinese-drama-charlora
Base model
Lightricks/LTX-Video