Canary-1B-v2 — CoreML (ANE)

CoreML conversion of nvidia/canary-1b-v2 for Apple Silicon / Neural Engine, packaged for FluidAudio.

Canary is a FastConformer encoder + Transformer attention encoder-decoder (AED) ASR model (25 European languages, 16384-token SentencePiece BPE). It is decoded autoregressively: the transformer decoder cross-attends to the encoder output and emits tokens greedily until EOS (id 3), with a 1024→16384 projection head.

Files

File	Role	Precision
`Preprocessor.mlmodelc`	waveform `[1,240000]` → mel `[1,128,1501]`	fp32 (CPU)
`EncoderInt4.mlmodelc`	mel → encoder `[1,1024,188]`	int4 (ANE, iOS18)
`DecoderInt4.mlmodelc`	autoregressive transformer → hidden `[1,256,1024]`	int4 (ANE, iOS18)
`Projection.mlmodelc`	hidden `[1,1024]` → logits `[1,16384]`	fp16 (ANE)
`vocab.json`	16384 SentencePiece pieces (`id → piece`)	—
`projection_weights.npz`	raw projection weights (for Python reference pipelines)	fp32
`metadata.json`	shapes, sample rate, special token ids	—

Contract: 15 s window (240000 samples @ 16 kHz), 256 decoder steps, eos=3, pad=2, bos=4. int4 weight payloads require iOS 18 / macOS 15.

Variants

int4 (this default): ANE-runnable, ~573 MB, fastest. Per-block-32 symmetric.
fp16: exact parity with PyTorch, iOS 17, ~1.8 GB (not included here by default).
int8 per-channel decodes correctly only on CPU (crashes the GPU/ANE MPSGraph backend), so it is not recommended; use int4 for an ANE-resident small build.

Accuracy / speed (LibriSpeech test-clean, ≤15 s, int4, M-series ANE)

Metric	Value
WER	~2.1%
RTFx	~7x

fp16 CoreML output is byte-identical to the NeMo PyTorch greedy decode.

Usage (FluidAudio)

let manager = try await CanaryManager.load(precision: .int4)
let text = try await manager.transcribe(audioURL: url)

Conversion

See the mobius conversion pipeline (models/stt/canary-1b-v2/coreml/): convert-coreml.py (NeMo→CoreML), quantize_int4.py, build_projection.py, validate.py, stage_hf.py.

License

Inherits cc-by-4.0 from the base model nvidia/canary-1b-v2.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for FluidInference/canary-1b-v2-coreml

Base model

nvidia/canary-1b-v2

Quantized

(6)

this model