LTX2.3 Audio Reactive LoRA

LoRA adapter for LTX-2.3 audio-reactive video generation.

LTX2.3 Audio Reactive LoRA is a LoRA adapter for LTX-2.3 designed to make video generation react more visibly to music and sound. It focuses on beat-locked visual motion: cubic forms, particles, light pulses, camera pushes, graphic texture, and material deformation moving in sync with kicks, bass, snares, hi-hats, and synth changes.

The LoRA is intended for audio-to-video and image-plus-audio-to-video workflows, especially with the fal.ai endpoint fal-ai/ltx-2.3-quality/audio-to-video/lora.

LoRA file:

https://huggingface.co/fal/ltx2.3-audio-reactive-lora/resolve/main/ltx2.3_audio_reactive_lora.safetensors

Try it on fal.ai:

https://fal.ai/models/fal-ai/ltx-2.3-quality/audio-to-video/lora

Direct fal.ai Example

Direct runnable fal.ai example:

https://fal.ai/models/fal-ai/ltx-2.3-quality/audio-to-video/lora?share=5884bbce-702a-4218-9683-a82a471a0b9b

Preview

Model Details

Base model: Lightricks/LTX-2.3
Base model relation: adapter / LoRA
Model type: LTX-2.3 LoRA adapter
Primary use: audio-reactive video generation
Best workflow: image first frame + audio + prompt
Recommended endpoint: fal-ai/ltx-2.3-quality/audio-to-video/lora
Recommended LoRA scale: 1.0 to 1.5
Current working scale: 1.2 to 1.5
Recommended FPS: 24
Recommended segment length: 5s to 15s
Recommended resolution: 1024x1024 for square visualizer clips
Recommended negative prompt: empty string unless the specific workflow needs constraints
Recommended first frame: structured visual material with clear shapes, depth, light sources, cubes, geometry, particles, layered graphic elements, or audio-visualizer forms
License: follows the LTX-2 Community License Agreement inherited from the LTX-2.3 base model

Prompt Language

Use language like this near the start of the prompt:

sound-driven video, audio-reactive motion, continuous visual flow

For stronger motion, repeat the audio-reactive instruction directly:

The video must be driven by the audio. The cubes must visibly move to the sound. The cubes must hit the beat: BAM BAM BAM.

Prompt Template

sound-driven video, audio-reactive motion, continuous visual flow.
This must be an aggressively audio-reactive cubic video. The cubes must visibly move to the sound. The cubes must visibly move to the sound.
The cubes must hit the beat: BAM BAM BAM. On every kick, large cubes slam, squash, jump, or punch forward. On every bass pulse, the whole 3D structure expands and compresses like a pressure engine.
On snares, cube layers snap sideways, cut, and reassemble. On hi-hats, tiny cube fragments, sparks, fine grain, color edges, and signal lines flicker fast.
On synth changes, surfaces ripple, panels unfold, glass blocks breathe, light seams stretch, and the camera pushes through depth.
Keep a premium dark 3D first-frame style: black glass, graphite, chrome, deep cobalt, electric cyan, acid green, controlled red, warm amber, tactile grain, color separation, subtle bloom.
No text, no logo, no border, no blank padding.

Example fal Input

{
  "prompt": "sound-driven video, audio-reactive motion, continuous visual flow. This must be an aggressively audio-reactive cubic video. The cubes must visibly move to the sound. The cubes must hit the beat: BAM BAM BAM. On every kick, large cubes slam, squash, jump, or punch forward. On every bass pulse, the whole 3D structure expands and compresses like a pressure engine. On snares, cube layers snap sideways, cut, and reassemble. On hi-hats, tiny cube fragments, sparks, fine grain, color edges, and signal lines flicker fast.",
  "audio_url": "https://...",
  "image_url": "https://...",
  "match_audio_length": true,
  "resolution": {
    "width": 1024,
    "height": 1024
  },
  "frames_per_second": 24,
  "num_inference_steps": 15,
  "guidance_scale": 1,
  "generate_audio": true,
  "image_strength": 0.62,
  "negative_prompt": "",
  "enable_prompt_expansion": false,
  "video_quality": "high",
  "video_write_mode": "balanced",
  "loras": [
    {
      "path": "https://huggingface.co/fal/ltx2.3-audio-reactive-lora/resolve/main/ltx2.3_audio_reactive_lora.safetensors",
      "scale": 1.2,
      "transformer": "both"
    }
  ]
}

Notes

This LoRA is most useful when the input image already contains clear structures that can move with the music: cubes, layered architecture, particles, light seams, waveform-like forms, glass blocks, or abstract visualizer shapes. It can be used without an image, but image-first-frame generation gives stronger art direction and more consistent results.

It can also be tested in ComfyUI or other local LTX-2.3 workflows as a standard LoRA, as long as the workflow supports LTX-2.3 LoRA loading and audio-conditioned generation.

Limitations

The LoRA improves audio-reactive motion but does not guarantee perfect beat detection in every clip.
Stronger motion usually comes from lower image strength, stronger prompt language, and clearly structured first frames.
Text inside first frames can drift during video generation; keep important text simple, high-contrast, and explicitly described as fixed if it must remain readable.
Generated videos should be reviewed before publication, especially for text stability, logo fidelity, and sync.

Credits

Created by Lovis Odin for fal.ai.

Downloads last month: 8

Model tree for fal/ltx2.3-audio-reactive-lora

Base model

Lightricks/LTX-2.3

Adapter

(57)

this model