Shenava — Rizeh-Pizeh v1.0 (6.9M) · cache-aware streaming · native-Rust (tract)

Cache-aware streaming CTC export of Shenava-Rizeh-Pizeh-v1.0 that runs in the pure-Rust tract engine — no C++, no ONNX Runtime. Part of VisualEars / Shenava: offline, on-device, streaming Persian ASR for the Deaf/Hard-of-Hearing.

Quality: intelligible (24.55% golden-6669 WER); real-time on a 2015 Cortex-A7. RTF ≈ 0.018 (20.3 ms/chunk on x86 CPU; chunk = 1.12 s audio).

⚠️ Requires patched tract (until upstreamed)

Stock tract rejects NeMo cache-aware streaming graphs in two inference-layer spots. Fix = a 23-line, 2-file patch (shenava_tract_streaming.patch, included) — PR open at sonos/tract#2441. Build tract with the patch, then load model.onnx normally. The graph itself is valid (identical decode to ONNX Runtime).

Streaming contract

Per-step inputs / outputs (fixed shapes, greedy CTC):

audio_signal [1,80,121] — un-normalized log-mel chunk (NeMo featurizer, normalize=NA)
length [1] i64 — true valid frames in the chunk
cache_last_channel [1,12,70,144], cache_last_time [1,12,144,8], cache_last_channel_len [1] i64 — start zeros / 0
→ logprobs [1,T',1025] + next caches

Chunking: feed 121-mel-frame chunks, shift 112 (9-frame pre-encode overlap). First chunk is 105 → pad to 121; pad the tail too; pass the true length. Thread the *_next caches back each step (cast cache_last_channel_len_next to i64). Greedy CTC: carry the previous token across chunk boundaries when collapsing repeats; blank id = 1024; map via tokens.txt; ▁→space.

Numbers are spoken-form → ITN

The model spells numbers (هشت not ۸). Apply persian_itn.py at display for spoken→Persian-digit (cardinals + هزار/میلیون/میلیارد + «و» + compounds).

Shenava-1 family (all native-Rust streaming)

Koochik 114M — flagship
Rizeh 32M — mid
Rizeh-Pizeh 6.9M — tiniest

Quantized variants — int4 / int8 (NEW)

Our streaming support is merged into tract main (sonos/tract#2441), which also ships int4 (MatMulNBits -> Q4_0) and int8 GEMM kernels. So tract main runs quantized versions of this streaming model:

file	precision	size	notes
`model.onnx`	fp32	33MB	reference
`model.int4.onnx`	int4 (MatMulNBits / Q4_0, weight-only)	14MB	⭐ recommended — 2.4x smaller, ~fp32 speed, byte-identical decode
`model.int8.onnx`	int8 (matmul-only, MatMulInteger)	17MB	byte-identical; slower on small-batch streaming (per-matmul `DynamicQuantizeLinear`) — best for large-batch / offline, or CPUs where it wins

Both quants decode byte-identically to fp32. For edge/on-device streaming, use model.int4.onnx (weight-only, no per-matmul activation quant). Needs tract main — the streaming fixes are upstream now, so the bundled .patch is no longer required.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Reza2kn/Shenava-Rizeh-Pizeh-v1.0-tract-streaming

Base model

nvidia/stt_fa_fastconformer_hybrid_large

Finetuned

Reza2kn/Shenava-Koochik-v1.0

Finetuned

Reza2kn/Shenava-Rizeh-v1.0

Finetuned

Reza2kn/Shenava-Rizeh-Pizeh-v1.0

Quantized

(5)

this model