Please use distilled ltx model

by shivshankar - opened 11 days ago

Discussion

shivshankar

11 days ago

Please use distilled ltx model which can generate audio in 2-4 step.

scenema-ai

Scenema AI org 9 days ago

We're already using the distilled model (8 steps). The bottleneck in our pipeline isn't denoising. It's Gemma 3 12B text encoding and audio VAE decoding. Even if we halved the diffusion steps, you wouldn't see a meaningful difference in wall-clock time.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment