Please use distilled ltx model
#3
by shivshankar - opened
Please use distilled ltx model which can generate audio in 2-4 step.
We're already using the distilled model (8 steps). The bottleneck in our pipeline isn't denoising. It's Gemma 3 12B text encoding and audio VAE decoding. Even if we halved the diffusion steps, you wouldn't see a meaningful difference in wall-clock time.