LFM2.5-Audio Tool-Aware v4 — MLX bf16

Apple-MLX port of matbee/lfm2.5-audio-tool-aware-v4, converted to the weight layout used by mlx-community/LFM2.5-Audio-1.5B-bf16. Weights are bfloat16 — no quantization. Drop-in replacement for any code that loads the upstream MLX base model.

The original model card is preserved below.

Conversion notes

Differences from the PyTorch checkpoint:

  • conformer.* → audio_encoder.* (NeMo → MLX subkey aliases: feed_forward1→ff1, norm_X→X_norm, batch_norm→norm).
  • depthformer.* → audio_head.depthformer.blocks.*; the fused operator.qkv_proj (1536×1024) split into separate attn.{q,k,v}_proj of shape 1024/256/256.
  • audio_adapter.model.N → audio_adapter.layers.N.
  • Conv kernels reshaped for MLX channels-last: depthwise/LFM2 conv1d (O, 1, k) → (O, k, 1); pointwise (O, I, 1) → (O, I) Linear; 4D pre-encode conv (O, I, H, W) → (O, H, W, I).
  • Tied depth-embedding to_logits cloned from embedding (safetensors cannot store aliased tensors).
  • Dropped: num_batches_tracked (BatchNorm counters), audio_loss_weights, codebook_offsets.

Verified: 924/924 tensor keys and shapes match the upstream MLX layout exactly; the LFM2 backbone loads into mlx_lm.models.lfm2 and runs a forward pass.


LFM2.5-Audio-1.5B — Tool-Aware Fine-Tune (v4)

Full fine-tune of LiquidAI/LFM2.5-Audio-1.5B that handles both turns of a tool-augmented voice flow plus chitchat and refusals.

Class Trigger Behavior
tool_match user audio + Tools available: block, requested tool listed Short ack ("setting your alarm now.") then stop
tool_result_speak same audio + Known facts you must use… block injected via set_context() Speak the result naturally ("your alarm is set for 7am.")
tool_miss requested tool not in the listed set Polite refusal ("i don't have a maps tool right now, sorry.")
non_tool conversational query, no tool implied Base-model-style natural reply (targets self-distilled from base)

Results vs v3

Held-out eval, 120 rows × 30 per class:

Class v3 v4 Δ
tool_match 96.7% 86.7% −10.0
tool_result_speak 100.0% 100.0% 0
tool_miss 80.0% 100.0% +20.0
non_tool 60.0% 86.7% +26.7
Overall 84.0% 93.3% +9.3

Novel-facts narration (60 OOD tool results never in training): 95% faithful / 0% memorized.

What changed in v4

  1. tool_miss ratio bumped 14% → 28%.
  2. Hard-negative tools_listed: 60% of tool_miss rows include a semantically adjacent tool (e.g. scenario=traffic with maps listed but not traffic).
  3. 19 diversified refusal templates (v3 had 5; v3 memorized phrasings).
  4. Explicit "if not listed, decline" clause in the instruction line.
  5. Tighter non_tool filter — drops DailyDialog context-fragments ("Spring .", "About 6:00 .").

Two-turn flow

# turn 1 — model emits "let me check the weather." and stops
# coordinator runs the weather tool, gets "Weather in Tokyo: 72°F, sunny."
await ctrl.<audio_node>.set_context("Weather in Tokyo: 72°F, sunny.")
# turn 2 — re-feed same user audio; model narrates ("it's 72 and sunny in tokyo.")

Training recipe

  • Base: LiquidAI/LFM2.5-Audio-1.5B, full bf16 finetune
  • Hardware: 2× RTX 4090
  • 3000 train + 400 eval, mix 22/24/28/26 (tool_match / tool_result_speak / tool_miss / non_tool)
  • bs=2/GPU × 2 GPUs × 1120 steps (~1.5 epochs)
  • lr 5e-5, cosine + 100 warmup, ctx=512
  • PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
  • Final val_loss = 0.89

Recipe + scripts: matbee/lfm2-tool-aware-dataset-v4.

Known limitations

  • 4/30 tool_match failures use a refusal template when the tool IS listed — refusal signal slightly over-corrected vs v3. v4.1 will rebalance.
  • 3/60 novel-facts mixed verdicts on iot_lights and weather.

Usage

import torch
from liquid_audio import LFM2AudioModel, LFM2AudioProcessor

processor = LFM2AudioProcessor.from_pretrained("matbee/lfm2.5-audio-tool-aware-v4", device="cuda")
model = LFM2AudioModel.from_pretrained(
    "matbee/lfm2.5-audio-tool-aware-v4", device="cuda", dtype=torch.bfloat16
).eval()

Predecessors

  • matbee/lfm2.5-audio-tool-aware-v1 — initial; mastered turn 1, regressed on turn 2 narration.
  • matbee/lfm2.5-audio-tool-aware-v2 — added tool_result_speak; 100% ack + 20/20 narration on its eval.
  • v3 (not published) — added distilled non_tool class; fixed narration but classifier-boundary regressed.
  • v4 (this release) — fixes tool_miss/non_tool boundary via hard negatives + diversified refusals.

License

Inherited from base: LFM Open License v1.0.

Downloads last month
22
Safetensors
Model size
1B params
Tensor type
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for matbee/lfm2.5-audio-tool-aware-v4-MLX

Finetuned
(1)
this model

Dataset used to train matbee/lfm2.5-audio-tool-aware-v4-MLX