whisper-small β€” QHexRT NPU bundle (Hexagon v79 + v81)

Precompiled Whisper-small ASR (12 layers, d_model 768) for the QHexRT runtime on the Qualcomm Hexagon NPU. Two arch-pinned bundles ship as sibling dirs:

dir Hexagon arch device encoder path device-validated
v79/ v79 (SM8750, Galaxy S25) Snapdragon 8 Elite AI-Hub in-graph conv βœ… transcript == HF
v81/ v81 (SM8850, soc 87) Snapdragon 8 Elite Gen 5 forge host-conv (encoder_features) βœ… WER 0

A context binary won't load on another arch (soc_model + dsp_arch baked in). The same qhx_asr runs both bundles (it branches on the encoder input). The asr_transcribe orchestrator is size-agnostic β€” base/small differ only in these bins + the manifest dims.


v81/ β€” Hexagon v81 (SM8850), compiled with forge (QAIRT 2.47)

Same host-conv decomposition as whisper-base (the HTP-hostile conv stem runs host-side from whisper_conv_stem.bin; the encoder graph starts at post-conv encoder_features [1500,768]). fp16 encoder + decoder. This bundle is a new forge port β€” the spec (recipes/whisper-small) is generated from the same exporter that produced the validated whisper-base v81, then device-gated on hardware.

Files (v81/)

file role
whisper-small.json QHexRT manifest (asr_transcribe plan; 12 layers / d768 / 12 heads)
whispersmall_enc_f16.bin encoder (post-conv encoder_features β†’ per-layer cross-attn K/V)
whispersmall_dec_f16.bin autoregressive decoder (self+cross attn, in-graph int32 embed + tied lm-head)
whisper_conv_stem.bin host-side conv-stem weights [c1w,c1b,c2w,c2b,pos]
whisper_small_mel_filters.bin HF 80-mel filter bank (host log-mel)
tokenizer.json Whisper tokenizer (vocab 51865)

Run (v81/)

huggingface-cli download runanywhere/whisper_small_HNPU --local-dir whisper_small_HNPU
adb push whisper_small_HNPU/v81 /data/local/tmp/wq/whisper_small   # QNN libs + v81 HTP skel from the QAIRT SDK
adb shell "cd /data/local/tmp/wq && LD_LIBRARY_PATH=. ADSP_LIBRARY_PATH=. \
  ./qhx_asr whisper_small/whisper-small.json libQnnHtp.so libQnnSystem.so whisper_small whisper_small/<audio16k>.wav"

Measured (SM8850 / v81, soc 87, QAIRT 2.47)

  • WER = 0 vs HF openai/whisper-small on a clean LibriSpeech clip and a 23 s clip (decode runs the full length). Encoder cross-KV cosine 1.00000, decoder 12/12 teacher-forced + greedy chain MATCH (export gate).

v79/ β€” Hexagon v79 (SM8750, Galaxy S25), Qualcomm AI Hub graphs

Encoder + decoder are Qualcomm AI Hub qnn_context_binary (float/fp16) graphs; the host pipeline is QHexRT's own. Device-validated: transcription matches the HF openai/whisper-small reference exactly. Measured on S25/v79: 718 ms for 2.67 s of audio (β‰ˆ3.7Γ— real-time).

Files (v79/)

file what
whisper-small.json QHexRT manifest (ASR family, asr_transcribe plan; 12 layers, d_model 768)
encoder.bin AI Hub Whisper-small encoder (mel β†’ 24 cross-attn KV)
decoder.bin AI Hub Whisper-small decoder (greedy step β†’ logits + self-KV)
whisper_mel_filters.bin HF 80-mel filter bank [201,80] f32 (shared across whisper sizes)
tokenizer.json Whisper multilingual tokenizer (vocab 51865, shared)

Audio: mono WAV (PCM16 or float32); resampled to 16 kHz host-side. Clips ≀ 30 s. No custom op-package needed. Source model: openai/whisper-small.

v81 bundle built + device-validated with QHexRT forge β€” recipes/whisper-small.

Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support