Canary-1B-v2 — CoreML (ANE)

CoreML conversion of nvidia/canary-1b-v2 for Apple Silicon / Neural Engine, packaged for FluidAudio.

Canary is a FastConformer encoder + Transformer attention encoder-decoder (AED) ASR model (25 European languages, 16384-token SentencePiece BPE). It is decoded autoregressively: the transformer decoder cross-attends to the encoder output and emits tokens greedily until EOS (id 3), with a 1024→16384 projection head.

Files

File Role Precision
Preprocessor.mlmodelc waveform [1,240000] → mel [1,128,1501] fp32 (CPU)
EncoderInt4.mlmodelc mel → encoder [1,1024,188] int4 (ANE, iOS18)
DecoderInt4.mlmodelc autoregressive transformer → hidden [1,256,1024] int4 (ANE, iOS18)
Projection.mlmodelc hidden [1,1024] → logits [1,16384] fp16 (ANE)
vocab.json 16384 SentencePiece pieces (id → piece) —
projection_weights.npz raw projection weights (for Python reference pipelines) fp32
metadata.json shapes, sample rate, special token ids —

Contract: 15 s window (240000 samples @ 16 kHz), 256 decoder steps, eos=3, pad=2, bos=4. int4 weight payloads require iOS 18 / macOS 15.

Variants

  • int4 (this default): ANE-runnable, ~573 MB, fastest. Per-block-32 symmetric.
  • fp16: exact parity with PyTorch, iOS 17, ~1.8 GB (not included here by default).
  • int8 per-channel decodes correctly only on CPU (crashes the GPU/ANE MPSGraph backend), so it is not recommended; use int4 for an ANE-resident small build.

Accuracy / speed (LibriSpeech test-clean, ≤15 s, int4, M-series ANE)

Metric Value
WER ~2.1%
RTFx ~7x

fp16 CoreML output is byte-identical to the NeMo PyTorch greedy decode.

Usage (FluidAudio)

let manager = try await CanaryManager.load(precision: .int4)
let text = try await manager.transcribe(audioURL: url)

Conversion

See the mobius conversion pipeline (models/stt/canary-1b-v2/coreml/): convert-coreml.py (NeMo→CoreML), quantize_int4.py, build_projection.py, validate.py, stage_hf.py.

License

Inherits cc-by-4.0 from the base model nvidia/canary-1b-v2.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FluidInference/canary-1b-v2-coreml

Quantized
(6)
this model