LFM2.5-230M — Hexagon v81 NPU (QHexRT)

LiquidAI/LFM2.5-230M running fully on the Qualcomm Hexagon v81 NPU (Snapdragon 8 Elite Gen-2 / SM8850) via QHexRT — RunAnywhere's inference engine for Qualcomm NPUs. 100% on the HTP. No Python in the hot path. W8 weight-only, GQA-native decode, batched prefill, on-NPU lm-head.

QHexRT is the first engine built to run LLM, VLM, STT, TTS, and embeddings fully on Qualcomm Hexagon NPUs. LFM 2.5 230M is the first model in the catalog.

Why the NPU — measured on SM8850 (vs llama.cpp CPU, same device)

metric	Hexagon v81 NPU	CPU (llama.cpp Q8_0)	NPU advantage
Prefill	12,540 tok/s	871 tok/s	~14× faster
Time-to-first-token (512-token prompt)	~36 ms (flat)	588 ms	~16× lower
End-to-end (512-token prompt + 128 new)	0.77 s	1.13 s	~1.5× faster

Batched O(1) prefill holds TTFT flat at ~36 ms regardless of prompt length, so the NPU pulls further ahead the longer the context — at far lower power than driving 8 CPU cores at max clock.

Full launch write-up: runanywhere.ai/blog · article draft in this repo

Run

hf download runanywhere/lfm2_5_230m_HNPU --local-dir lfm2_5_230m_HNPU
adb push lfm2_5_230m_HNPU/v81 /data/local/tmp/lfm230     # PowerShell + native paths on Windows
adb shell "cd /data/local/tmp/lfm230 && LD_LIBRARY_PATH=. \
  ./qhx_generate lfm2-5-230m.json libQnnHtp.so libQnnSystem.so . 64 'The capital of France is'"

Stage the QAIRT v81 runtime libs (libQnnHtp.so, libQnnSystem.so, libQnnHtpV81Skel.so/Stub.so) + the qhx_generate tool into the same dir (from the QAIRT SDK). Context binaries are arch-pinned to v81. Contact san@runanywhere.ai for QHexRT deployment access.

`v81/`

lfm2-5-230m.json (manifest) · lfm230_dec_512_w8.bin (decode) · lfm230_pf_512_w8.bin (prefill) · lfm230_lmh_w8.bin (lm-head) · lfm_embed_f16.bin (embeddings) · tokenizer.json

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for runanywhere/lfm2_5_230m_HNPU

Base model

LiquidAI/LFM2.5-230M-Base

Finetuned

LiquidAI/LFM2.5-230M

Finetuned

(7)

this model

Article mentioning runanywhere/lfm2_5_230m_HNPU

QHexRT Is Live: Full-Stack NPU Inference for Qualcomm Hexagon

runanywhere

•

about 10 hours ago