LTX2.3 Audio Reactive LoRA

LoRA adapter for LTX-2.3 audio-reactive video generation.

LTX2.3 Audio Reactive LoRA is a LoRA adapter for LTX-2.3 designed to make video generation react more visibly to music and sound. It focuses on beat-locked visual motion: cubic forms, particles, light pulses, camera pushes, graphic texture, and material deformation moving in sync with kicks, bass, snares, hi-hats, and synth changes.

The LoRA is intended for audio-to-video and image-plus-audio-to-video workflows, especially with the fal.ai endpoint fal-ai/ltx-2.3-quality/audio-to-video/lora.

LoRA file:

https://huggingface.co/fal/ltx2.3-audio-reactive-lora/resolve/main/ltx2.3_audio_reactive_lora.safetensors

Try it on fal.ai:

https://fal.ai/models/fal-ai/ltx-2.3-quality/audio-to-video/lora

Direct fal.ai Example

Direct runnable fal.ai example:

https://fal.ai/models/fal-ai/ltx-2.3-quality/audio-to-video/lora?share=5884bbce-702a-4218-9683-a82a471a0b9b

Preview

fal.ai Logo Audio-Reactive Ident Extreme Exploded Cube Cutaway Disco Landscape Burst

Model Details

  • Base model: Lightricks/LTX-2.3
  • Base model relation: adapter / LoRA
  • Model type: LTX-2.3 LoRA adapter
  • Primary use: audio-reactive video generation
  • Best workflow: image first frame + audio + prompt
  • Recommended endpoint: fal-ai/ltx-2.3-quality/audio-to-video/lora
  • Recommended LoRA scale: 1.0 to 1.5
  • Current working scale: 1.2 to 1.5
  • Recommended FPS: 24
  • Recommended segment length: 5s to 15s
  • Recommended resolution: 1024x1024 for square visualizer clips
  • Recommended negative prompt: empty string unless the specific workflow needs constraints
  • Recommended first frame: structured visual material with clear shapes, depth, light sources, cubes, geometry, particles, layered graphic elements, or audio-visualizer forms
  • License: follows the LTX-2 Community License Agreement inherited from the LTX-2.3 base model

Prompt Language

Use language like this near the start of the prompt:

sound-driven video, audio-reactive motion, continuous visual flow

For stronger motion, repeat the audio-reactive instruction directly:

The video must be driven by the audio. The cubes must visibly move to the sound. The cubes must hit the beat: BAM BAM BAM.

Prompt Template

sound-driven video, audio-reactive motion, continuous visual flow.
This must be an aggressively audio-reactive cubic video. The cubes must visibly move to the sound. The cubes must visibly move to the sound.
The cubes must hit the beat: BAM BAM BAM. On every kick, large cubes slam, squash, jump, or punch forward. On every bass pulse, the whole 3D structure expands and compresses like a pressure engine.
On snares, cube layers snap sideways, cut, and reassemble. On hi-hats, tiny cube fragments, sparks, fine grain, color edges, and signal lines flicker fast.
On synth changes, surfaces ripple, panels unfold, glass blocks breathe, light seams stretch, and the camera pushes through depth.
Keep a premium dark 3D first-frame style: black glass, graphite, chrome, deep cobalt, electric cyan, acid green, controlled red, warm amber, tactile grain, color separation, subtle bloom.
No text, no logo, no border, no blank padding.

Example fal Input

{
  "prompt": "sound-driven video, audio-reactive motion, continuous visual flow. This must be an aggressively audio-reactive cubic video. The cubes must visibly move to the sound. The cubes must hit the beat: BAM BAM BAM. On every kick, large cubes slam, squash, jump, or punch forward. On every bass pulse, the whole 3D structure expands and compresses like a pressure engine. On snares, cube layers snap sideways, cut, and reassemble. On hi-hats, tiny cube fragments, sparks, fine grain, color edges, and signal lines flicker fast.",
  "audio_url": "https://...",
  "image_url": "https://...",
  "match_audio_length": true,
  "resolution": {
    "width": 1024,
    "height": 1024
  },
  "frames_per_second": 24,
  "num_inference_steps": 15,
  "guidance_scale": 1,
  "generate_audio": true,
  "image_strength": 0.62,
  "negative_prompt": "",
  "enable_prompt_expansion": false,
  "video_quality": "high",
  "video_write_mode": "balanced",
  "loras": [
    {
      "path": "https://huggingface.co/fal/ltx2.3-audio-reactive-lora/resolve/main/ltx2.3_audio_reactive_lora.safetensors",
      "scale": 1.2,
      "transformer": "both"
    }
  ]
}

Notes

This LoRA is most useful when the input image already contains clear structures that can move with the music: cubes, layered architecture, particles, light seams, waveform-like forms, glass blocks, or abstract visualizer shapes. It can be used without an image, but image-first-frame generation gives stronger art direction and more consistent results.

It can also be tested in ComfyUI or other local LTX-2.3 workflows as a standard LoRA, as long as the workflow supports LTX-2.3 LoRA loading and audio-conditioned generation.

Limitations

  • The LoRA improves audio-reactive motion but does not guarantee perfect beat detection in every clip.
  • Stronger motion usually comes from lower image strength, stronger prompt language, and clearly structured first frames.
  • Text inside first frames can drift during video generation; keep important text simple, high-contrast, and explicitly described as fixed if it must remain readable.
  • Generated videos should be reviewed before publication, especially for text stability, logo fidelity, and sync.

Credits

Created by Lovis Odin for fal.ai.

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fal/ltx2.3-audio-reactive-lora

Adapter
(57)
this model