Shenava — Rizeh Pizeh v1.0 (6.9M) · CoreML iOS15 NeuralNetwork fp16

CoreML NeuralNetwork (not ML Program) fp16 export of Reza2kn/Shenava-Rizeh-Pizeh-v1.0 — built so older Apple devices capped at iOS 15 (e.g. iPad Air 2 / iOS 15.8) can load and run it. ML Program packages require iOS 16+; this targets NeuralNetwork / CoreML spec v5 with iOS 14 availability, so it runs on iOS 15.

This is the cache-aware streaming step (one 170 ms prediction), the same kind of artifact as shenava-fa-fastconformer-streaming-32m-coreml-ios15-fp16.

The Shenava-1 family (CoreML iOS15)

Benchmark — fair WER/CER (parent model, decoded @ [70,13])

Member golden-6669 WER CER FLEURS-fa WER CER
Rizeh Pizeh v1.0 (6.9M) 24.55% 8.89% 26.95% 10.22%

CoreML contract (cache-aware streaming CTC step, att_context [70,0])

Inputs:

  • processed_signal: Float32 [1, 80, 17]
  • cache_last_channel: Float32 [12, 1, 70, 144]
  • cache_last_time: Float32 [12, 1, 144, 8]

Outputs:

  • logits: Float32 [1, 1, 1025]
  • cache_last_channel_next: Float32 [12, 1, 70, 144]
  • cache_last_time_next: Float32 [12, 1, 144, 8]

Streaming geometry: feature_frames per prediction = 17 (pre_encode_cache 9 + chunk 8), audio window 170 ms, constant cache length 70, d_model=144, 12 conformer layers, ×8 subsampling (80 ms/frame).

Compatibility (Xcode coremlc)

  • model type: MLModelType_neuralNetwork
  • storage precision: Float16
  • specification version: 5
  • availability: iOS 14.0, macOS 11.0
coremlc compile shenava_rizeh_pizeh_v1_0_ctc_streaming_att70_0_ios15_fp16.mlmodel /tmp/out --deployment-target 15.0 --platform ios

Files

  • shenava_rizeh_pizeh_v1_0_ctc_streaming_att70_0_ios15_fp16.mlmodel — fp16 NeuralNetwork model (~14 MB)
  • tokens.json, preprocessor.json, mel_filters_slaney_80x257.json — sidecars (ve_tok_v4, shared across the family)
  • shenava_rizeh_pizeh_v1_0_ctc_streaming_att70_0_ios15_fp16_manifest.json — export manifest
  • export_koochik10_streaming_coreml.py — reproducible export script

Tokenizer: ve_tok_v4 (SentencePiece BPE-1024 +blank, digit/punct/«»-aware). Numbers are emitted in spoken form; apply Persian ITN at display for digits. Part of VisualEars / Shenava.

Export stack: coremltools 9.0, torch 2.7.0, NeMo 2.7.3. fp16 vs fp32 argmax agreement: 1.000.

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Reza2kn/Shenava-Rizeh-Pizeh-v1.0-CoreML-iOS15-fp16