Instructions to use matbee/lfm2.5-audio-tool-aware-v4-MLX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use matbee/lfm2.5-audio-tool-aware-v4-MLX with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir lfm2.5-audio-tool-aware-v4-MLX matbee/lfm2.5-audio-tool-aware-v4-MLX
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
LFM2.5-Audio Tool-Aware v4 — MLX bf16
Apple-MLX port of matbee/lfm2.5-audio-tool-aware-v4, converted to the weight layout used by mlx-community/LFM2.5-Audio-1.5B-bf16. Weights are bfloat16 — no quantization. Drop-in replacement for any code that loads the upstream MLX base model.
The original model card is preserved below.
Conversion notes
Differences from the PyTorch checkpoint:
conformer.*→audio_encoder.*(NeMo → MLX subkey aliases:feed_forward1→ff1,norm_X→X_norm,batch_norm→norm).depthformer.*→audio_head.depthformer.blocks.*; the fusedoperator.qkv_proj(1536×1024) split into separateattn.{q,k,v}_projof shape 1024/256/256.audio_adapter.model.N→audio_adapter.layers.N.- Conv kernels reshaped for MLX channels-last: depthwise/LFM2 conv1d
(O, 1, k) → (O, k, 1); pointwise(O, I, 1) → (O, I)Linear; 4D pre-encode conv(O, I, H, W) → (O, H, W, I). - Tied depth-embedding
to_logitscloned fromembedding(safetensors cannot store aliased tensors). - Dropped:
num_batches_tracked(BatchNorm counters),audio_loss_weights,codebook_offsets.
Verified: 924/924 tensor keys and shapes match the upstream MLX layout exactly; the LFM2 backbone loads into mlx_lm.models.lfm2 and runs a forward pass.
LFM2.5-Audio-1.5B — Tool-Aware Fine-Tune (v4)
Full fine-tune of LiquidAI/LFM2.5-Audio-1.5B that handles both turns of a tool-augmented voice flow plus chitchat and refusals.
| Class | Trigger | Behavior |
|---|---|---|
tool_match |
user audio + Tools available: block, requested tool listed |
Short ack ("setting your alarm now.") then stop |
tool_result_speak |
same audio + Known facts you must use… block injected via set_context() |
Speak the result naturally ("your alarm is set for 7am.") |
tool_miss |
requested tool not in the listed set | Polite refusal ("i don't have a maps tool right now, sorry.") |
non_tool |
conversational query, no tool implied | Base-model-style natural reply (targets self-distilled from base) |
Results vs v3
Held-out eval, 120 rows × 30 per class:
| Class | v3 | v4 | Δ |
|---|---|---|---|
tool_match |
96.7% | 86.7% | −10.0 |
tool_result_speak |
100.0% | 100.0% | 0 |
tool_miss |
80.0% | 100.0% | +20.0 |
non_tool |
60.0% | 86.7% | +26.7 |
| Overall | 84.0% | 93.3% | +9.3 |
Novel-facts narration (60 OOD tool results never in training): 95% faithful / 0% memorized.
What changed in v4
tool_missratio bumped 14% → 28%.- Hard-negative
tools_listed: 60% oftool_missrows include a semantically adjacent tool (e.g. scenario=traffic withmapslisted but nottraffic). - 19 diversified refusal templates (v3 had 5; v3 memorized phrasings).
- Explicit "if not listed, decline" clause in the instruction line.
- Tighter
non_toolfilter — drops DailyDialog context-fragments ("Spring .","About 6:00 .").
Two-turn flow
# turn 1 — model emits "let me check the weather." and stops
# coordinator runs the weather tool, gets "Weather in Tokyo: 72°F, sunny."
await ctrl.<audio_node>.set_context("Weather in Tokyo: 72°F, sunny.")
# turn 2 — re-feed same user audio; model narrates ("it's 72 and sunny in tokyo.")
Training recipe
- Base:
LiquidAI/LFM2.5-Audio-1.5B, full bf16 finetune - Hardware: 2× RTX 4090
- 3000 train + 400 eval, mix 22/24/28/26 (tool_match / tool_result_speak / tool_miss / non_tool)
- bs=2/GPU × 2 GPUs × 1120 steps (~1.5 epochs)
- lr 5e-5, cosine + 100 warmup, ctx=512
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True- Final val_loss = 0.89
Recipe + scripts: matbee/lfm2-tool-aware-dataset-v4.
Known limitations
- 4/30
tool_matchfailures use a refusal template when the tool IS listed — refusal signal slightly over-corrected vs v3. v4.1 will rebalance. - 3/60 novel-facts mixed verdicts on
iot_lightsandweather.
Usage
import torch
from liquid_audio import LFM2AudioModel, LFM2AudioProcessor
processor = LFM2AudioProcessor.from_pretrained("matbee/lfm2.5-audio-tool-aware-v4", device="cuda")
model = LFM2AudioModel.from_pretrained(
"matbee/lfm2.5-audio-tool-aware-v4", device="cuda", dtype=torch.bfloat16
).eval()
Predecessors
matbee/lfm2.5-audio-tool-aware-v1— initial; mastered turn 1, regressed on turn 2 narration.matbee/lfm2.5-audio-tool-aware-v2— addedtool_result_speak; 100% ack + 20/20 narration on its eval.- v3 (not published) — added distilled
non_toolclass; fixed narration but classifier-boundary regressed. - v4 (this release) — fixes tool_miss/non_tool boundary via hard negatives + diversified refusals.
License
Inherited from base: LFM Open License v1.0.
- Downloads last month
- 22
Quantized
Model tree for matbee/lfm2.5-audio-tool-aware-v4-MLX
Base model
LiquidAI/LFM2-1.2B