Liquid AI
Try LFM โ€ข Docs โ€ข LEAP โ€ข Discord

LFM2.5-230M-ONNX

ONNX export of LFM2.5-230M for cross-platform inference.

LFM2.5 is a hybrid architecture combining multiplicative gates and short convolutions, optimized for edge deployment with fast inference on CPU, GPU, and NPU hardware.

Recommended Variants

Precision Size Platform Use Case
Q4 ~200 MB WebGPU, Server Recommended for most uses (quantized embedding)
Q4F32 ~390 MB Server (CPU/GPU) Q4 weights with FP32 embedding โ€” higher quality
FP16 ~455 MB WebGPU, Server Higher quality
Q8 ~470 MB Server only Balance of quality and size
  • WebGPU: Use Q4 or FP16 (Q4F32 and Q8 are not supported on WebGPU).
  • Server (CPU/GPU): All variants supported. Q4F32 keeps the embedding in FP32 for higher fidelity.

The tied embedding / LM head is kept in FP32 across all quantized builds.

Model Files

onnx/
โ”œโ”€โ”€ model.onnx              # FP32
โ”œโ”€โ”€ model_fp16.onnx         # FP16
โ”œโ”€โ”€ model_q4.onnx           # Q4, quantized embedding (WebGPU)
โ”œโ”€โ”€ model_q4f32.onnx        # Q4 weights, FP32 embedding (server)
โ””โ”€โ”€ model_q8.onnx           # Q8

Python (onnxruntime)

pip install onnxruntime transformers numpy huggingface_hub
# or, for GPU:
pip install onnxruntime-gpu transformers numpy huggingface_hub
from huggingface_hub import hf_hub_download

model_id = "LiquidAI/LFM2.5-230M-ONNX"
# Q4F32 recommended for server CPU/GPU; use model_q4.onnx for WebGPU.
hf_hub_download(model_id, "onnx/model_q4f32.onnx")
hf_hub_download(model_id, "onnx/model_q4f32.onnx_data")

WebGPU (Transformers.js)

import { pipeline } from "@huggingface/transformers";

const generator = await pipeline("text-generation", "LiquidAI/LFM2.5-230M-ONNX", {
  device: "webgpu",
  dtype: "q4", // or "fp16"
});
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LiquidAI/LFM2.5-230M-ONNX

Quantized
(10)
this model

Space using LiquidAI/LFM2.5-230M-ONNX 1