LFM2.5-Audio Tool-Aware v4 — MLX bf16

Apple-MLX port of matbee/lfm2.5-audio-tool-aware-v4, converted to the weight layout used by mlx-community/LFM2.5-Audio-1.5B-bf16. Weights are bfloat16 — no quantization. Drop-in replacement for any code that loads the upstream MLX base model.

The original model card is preserved below.

Conversion notes

Differences from the PyTorch checkpoint:

conformer.* → audio_encoder.* (NeMo → MLX subkey aliases: feed_forward1→ff1, norm_X→X_norm, batch_norm→norm).
depthformer.* → audio_head.depthformer.blocks.*; the fused operator.qkv_proj (1536×1024) split into separate attn.{q,k,v}_proj of shape 1024/256/256.
audio_adapter.model.N → audio_adapter.layers.N.
Conv kernels reshaped for MLX channels-last: depthwise/LFM2 conv1d (O, 1, k) → (O, k, 1); pointwise (O, I, 1) → (O, I) Linear; 4D pre-encode conv (O, I, H, W) → (O, H, W, I).
Tied depth-embedding to_logits cloned from embedding (safetensors cannot store aliased tensors).
Dropped: num_batches_tracked (BatchNorm counters), audio_loss_weights, codebook_offsets.

Verified: 924/924 tensor keys and shapes match the upstream MLX layout exactly; the LFM2 backbone loads into mlx_lm.models.lfm2 and runs a forward pass.

LFM2.5-Audio-1.5B — Tool-Aware Fine-Tune (v4)

Full fine-tune of LiquidAI/LFM2.5-Audio-1.5B that handles both turns of a tool-augmented voice flow plus chitchat and refusals.

Class	Trigger	Behavior
`tool_match`	user audio + `Tools available:` block, requested tool listed	Short ack (`"setting your alarm now."`) then stop
`tool_result_speak`	same audio + `Known facts you must use…` block injected via `set_context()`	Speak the result naturally (`"your alarm is set for 7am."`)
`tool_miss`	requested tool not in the listed set	Polite refusal (`"i don't have a maps tool right now, sorry."`)
`non_tool`	conversational query, no tool implied	Base-model-style natural reply (targets self-distilled from base)

Results vs v3

Held-out eval, 120 rows × 30 per class:

Class	v3	v4	Δ
`tool_match`	96.7%	86.7%	−10.0
`tool_result_speak`	100.0%	100.0%	0
`tool_miss`	80.0%	100.0%	+20.0
`non_tool`	60.0%	86.7%	+26.7
Overall	84.0%	93.3%	+9.3

Novel-facts narration (60 OOD tool results never in training): 95% faithful / 0% memorized.

What changed in v4

tool_miss ratio bumped 14% → 28%.
Hard-negative tools_listed: 60% of tool_miss rows include a semantically adjacent tool (e.g. scenario=traffic with maps listed but not traffic).
19 diversified refusal templates (v3 had 5; v3 memorized phrasings).
Explicit "if not listed, decline" clause in the instruction line.
Tighter non_tool filter — drops DailyDialog context-fragments ("Spring .", "About 6:00 .").

Two-turn flow

# turn 1 — model emits "let me check the weather." and stops
# coordinator runs the weather tool, gets "Weather in Tokyo: 72°F, sunny."
await ctrl.<audio_node>.set_context("Weather in Tokyo: 72°F, sunny.")
# turn 2 — re-feed same user audio; model narrates ("it's 72 and sunny in tokyo.")

Training recipe

Base: LiquidAI/LFM2.5-Audio-1.5B, full bf16 finetune
Hardware: 2× RTX 4090
3000 train + 400 eval, mix 22/24/28/26 (tool_match / tool_result_speak / tool_miss / non_tool)
bs=2/GPU × 2 GPUs × 1120 steps (~1.5 epochs)
lr 5e-5, cosine + 100 warmup, ctx=512
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
Final val_loss = 0.89

Recipe + scripts: matbee/lfm2-tool-aware-dataset-v4.

Known limitations

4/30 tool_match failures use a refusal template when the tool IS listed — refusal signal slightly over-corrected vs v3. v4.1 will rebalance.
3/60 novel-facts mixed verdicts on iot_lights and weather.

Usage

import torch
from liquid_audio import LFM2AudioModel, LFM2AudioProcessor

processor = LFM2AudioProcessor.from_pretrained("matbee/lfm2.5-audio-tool-aware-v4", device="cuda")
model = LFM2AudioModel.from_pretrained(
    "matbee/lfm2.5-audio-tool-aware-v4", device="cuda", dtype=torch.bfloat16
).eval()

Predecessors

matbee/lfm2.5-audio-tool-aware-v1 — initial; mastered turn 1, regressed on turn 2 narration.
matbee/lfm2.5-audio-tool-aware-v2 — added tool_result_speak; 100% ack + 20/20 narration on its eval.
v3 (not published) — added distilled non_tool class; fixed narration but classifier-boundary regressed.
v4 (this release) — fixes tool_miss/non_tool boundary via hard negatives + diversified refusals.

License

Inherited from base: LFM Open License v1.0.

Downloads last month: 22

Safetensors

Model size

1B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for matbee/lfm2.5-audio-tool-aware-v4-MLX

Base model

LiquidAI/LFM2-1.2B

Finetuned

LiquidAI/LFM2.5-Audio-1.5B

Finetuned

matbee/lfm2.5-audio-tool-aware-v4

Finetuned

(1)

this model

matbee
/

lfm2.5-audio-tool-aware-v4-MLX

LFM2.5-Audio Tool-Aware v4 — MLX bf16

Conversion notes

LFM2.5-Audio-1.5B — Tool-Aware Fine-Tune (v4)

Results vs v3

What changed in v4

Two-turn flow

Training recipe

Known limitations

Usage

Predecessors

License

Model tree for matbee/lfm2.5-audio-tool-aware-v4-MLX

Dataset used to train matbee/lfm2.5-audio-tool-aware-v4-MLX