TSANTSALIZE β IC-LoRA for LTX-2.3
The Burgstall is back at it with another mighty useless release! This one is called TSANTSALIZE!
What it does
This IC-LoRA shrinks the head of the speaker. That's it. That's the effect.
I tried about 7 different ways (and 7 different training runs) to see if I could get the model to train on both video and audio layers, but apparently that is not possible. It wasn't necessary anyway, as the audio effect is fully doable with just the workflow. So the included workflow has a few simple audio processing nodes that change the voice to match (in my opinion, feel free to tweak) the shrunk head.
NOTE! This was trained mostly on videos of a person "speaking to the camera", so your mileage with any other type of footage will vary.
Examples
BAN β before & after:
LADY β before & after:
Usage
Prompting
The trigger word is simply tsantsalize. You can also try adding "tiny head" or similar to the prompt β it might help.
Strength
In my tests strength 1.2 has been the sweet spot. Anything above that and there will be more identity drift.
Workflow
The ComfyUI workflow is included in this repo (TSANTSALIZE-release_1706.json). It uses:
- LTX-2.3-22B base model with the distilled fp8 transformer
- IC-LoRA guidance with the first frame as reference
- MelBand RoFormer for vocal/instrument separation
- A pitch shift (+8 semitones) and highpass filter (350 Hz) on the vocals to match the tiny-head aesthetic
Drop your video in, set the trigger word to "tsantsalize", and you're good to go.
Requirements
- ComfyUI
- ComfyUI-LTXVideo custom nodes
- ComfyUI-KJNodes
- ComfyUI-VideoHelperSuite
- Base model:
ltx-2.3-22b-distilled-1.1_transformer_only_mxfp8_block32.safetensors - Text encoder:
gemma_3_12B_it.safetensors
Technical details
- Base model: LTX-2.3-22B
- Training mode: IC-LoRA (video-to-video, F=1 first-frame conditioning)
- LoRA rank: 64, alpha: 64
- Target modules: attention layers (q, k, v, out projections) + feed-forward layers
- Optimizer: Prodigy with constant schedule
- Mixed precision: bf16
- Training config: included in
ltx2_tsantsalize_h100.yaml
The dataset was fully synthetic.
License
CC BY-NC 4.0