LFM2.5-230M — Hexagon v81 NPU (QHexRT)

LiquidAI/LFM2.5-230M running fully on the Qualcomm Hexagon v81 NPU (Snapdragon 8 Elite Gen-2 / SM8850) via QHexRT — RunAnywhere's inference engine for Qualcomm NPUs. 100% on the HTP. No Python in the hot path. W8 weight-only, GQA-native decode, batched prefill, on-NPU lm-head.

QHexRT is the first engine built to run LLM, VLM, STT, TTS, and embeddings fully on Qualcomm Hexagon NPUs. LFM 2.5 230M is the first model in the catalog.

Why the NPU — measured on SM8850 (vs llama.cpp CPU, same device)

metric Hexagon v81 NPU CPU (llama.cpp Q8_0) NPU advantage
Prefill 12,540 tok/s 871 tok/s ~14× faster
Time-to-first-token (512-token prompt) ~36 ms (flat) 588 ms ~16× lower
End-to-end (512-token prompt + 128 new) 0.77 s 1.13 s ~1.5× faster

Batched O(1) prefill holds TTFT flat at ~36 ms regardless of prompt length, so the NPU pulls further ahead the longer the context — at far lower power than driving 8 CPU cores at max clock.

Full launch write-up: runanywhere.ai/blog · article draft in this repo

Run

hf download runanywhere/lfm2_5_230m_HNPU --local-dir lfm2_5_230m_HNPU
adb push lfm2_5_230m_HNPU/v81 /data/local/tmp/lfm230     # PowerShell + native paths on Windows
adb shell "cd /data/local/tmp/lfm230 && LD_LIBRARY_PATH=. \
  ./qhx_generate lfm2-5-230m.json libQnnHtp.so libQnnSystem.so . 64 'The capital of France is'"

Stage the QAIRT v81 runtime libs (libQnnHtp.so, libQnnSystem.so, libQnnHtpV81Skel.so/Stub.so) + the qhx_generate tool into the same dir (from the QAIRT SDK). Context binaries are arch-pinned to v81. Contact san@runanywhere.ai for QHexRT deployment access.

v81/

lfm2-5-230m.json (manifest) · lfm230_dec_512_w8.bin (decode) · lfm230_pf_512_w8.bin (prefill) · lfm230_lmh_w8.bin (lm-head) · lfm_embed_f16.bin (embeddings) · tokenizer.json

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for runanywhere/lfm2_5_230m_HNPU

Finetuned
(7)
this model

Article mentioning runanywhere/lfm2_5_230m_HNPU