LFM2.5-Audio-1.5B GGUF
GGUF quantizations of LiquidAI/LFM2.5-Audio-1.5B for CrispASR.
LFM2.5-Audio is Liquid AI's end-to-end multimodal speech model supporting ASR (speech-to-text), TTS (text-to-speech), and speech-to-speech in a single 1.5B parameter model. This is the English base variant. Achieves 7.53 average WER across standard English ASR benchmarks, competitive with models 3x its size.
Architecture
| Component | Details |
|---|---|
| Encoder | 17-layer FastConformer (512-dim, 8 heads, rel-pos attention, dw-striding 8x subsampling) |
| Adapter | LayerNorm + Linear(512->2048) + GELU + Linear(2048->2048) |
| Backbone | 16-layer LFM2 hybrid conv+attention (2048-dim, 32 heads / 8 KV heads, RoPE theta=1M) |
| Depthformer | 6-layer transformer (1024-dim) with 8-codebook Mimi audio token generation |
| Audio codec | Mimi (8 codebooks, 24 kHz) |
| Parameters | 1.5B total |
Available quantizations
| File | Quant | Size | Notes |
|---|---|---|---|
lfm2-audio-1.5b-f16.gguf |
F16 | ~3.1 GB | Full precision reference |
lfm2-audio-1.5b-q8_0.gguf |
Q8_0 | ~1.7 GB | High quality |
lfm2-audio-1.5b-q5_k.gguf |
Q5_K | ~1.6 GB | Recommended (verified identical output) |
Note: Q4_K is too aggressive for the English variant and causes early EOS. Use Q5_K or Q8_0.
Usage with CrispASR
# Transcribe English audio
./crispasr -m lfm2-audio-1.5b-q5_k.gguf -f audio.wav -l en
# Or with auto-download
./crispasr --backend lfm2-audio -m auto -f audio.wav
Conversion
Converted from the original safetensors using:
python models/convert-lfm2-audio-to-gguf.py \
--input LiquidAI/LFM2.5-Audio-1.5B \
--output lfm2-audio-1.5b-f16.gguf
# Quantize
./crispasr-quantize lfm2-audio-1.5b-f16.gguf lfm2-audio-1.5b-q5_k.gguf q5_k
License
LFM Open License v1.0 - Commercial use permitted for entities with annual revenue under $10M USD. See the upstream license for full terms.
Components include: Apache-2.0 (NVIDIA NeMo), MIT (Kyutai Moshi), CC-BY-4.0 (Canary checkpoint).
Credits
- Downloads last month
- 209
8-bit
16-bit