kb-whisper-large β CoreML Encoder for whisper.cpp
Pre-compiled CoreML encoder for KBLab/kb-whisper-large, the Swedish Whisper model from the National Library of Sweden.
This lets you run kb-whisper-large in whisper.cpp with CoreML/ANE acceleration on Apple Silicon Macs.
Files
| File | Description |
|---|---|
ggml-model-encoder.mlmodelc/ |
CoreML encoder with int8-quantized weights (float32 I/O). 609 MB β half the size of a float32 encoder, faster on ANE. |
Usage
Download the GGML model weights from KBLab/kb-whisper-large:
wget https://huggingface.co/KBLab/kb-whisper-large/resolve/main/ggml-model-q5_0.bin # or the full-precision version: # wget https://huggingface.co/KBLab/kb-whisper-large/resolve/main/ggml-model.binDownload
ggml-model-encoder.mlmodelc/from this repo and place it in the same directory as the GGML model file:your-model-dir/ βββ ggml-model-q5_0.bin β GGML weights βββ ggml-model-encoder.mlmodelc/ β this CoreML encoderBuild whisper.cpp with CoreML support:
git clone https://github.com/ggerganov/whisper.cpp cd whisper.cpp cmake -B build -DWHISPER_COREML=ON cmake --build build --config ReleaseRun:
./build/bin/whisper-cli -m your-model-dir/ggml-model-q5_0.bin -f audio.wav -l svwhisper.cpp will automatically detect and load the CoreML encoder.
Performance tips
For maximum speed, especially on M1/M2 where the decoder is the bottleneck:
# Greedy decoding (drop beam search) β biggest single speedup
./build/bin/whisper-cli -m ggml-model-q5_0.bin -f audio.wav -l sv -bs 1 --best-of 1
# M1/M2: use 4 threads (performance cores only)
./build/bin/whisper-cli -m ggml-model-q5_0.bin -f audio.wav -l sv -t 4
Notes
- Built on macOS with Apple M4 Max using
coremltoolsandcompute_units=ALL. - Encoder weights are int8-quantized (per-channel, linear symmetric) for faster ANE inference and smaller file size. I/O tensors remain float32 to match whisper.cpp's internal data format.
- The key fix vs naive conversions: whisper.cpp's CoreML bridge requires the input tensor to be named
logmel_data(notmel). An incorrectly named input causes silent garbage output from the encoder. - The encoder was generated from the
mainbranch ofKBLab/kb-whisper-largeas of May 2025 (Stage 2 checkpoint). If KBLab update their weights, regenerate using the script below.
Regenerating the encoder
# make_coreml.py
import numpy as np
import torch
import coremltools as ct
import coremltools.optimize.coreml as cto
from transformers import WhisperForConditionalGeneration
class EncoderWrapper(torch.nn.Module):
def __init__(self, encoder):
super().__init__()
self.encoder = encoder
def forward(self, mel):
return self.encoder(mel).last_hidden_state
MODEL_DIR = "path/to/kb_whisper_large" # local HuggingFace checkout
OUTPUT_MLPACKAGE = f"{MODEL_DIR}/ggml-model-encoder.mlpackage"
model = WhisperForConditionalGeneration.from_pretrained(MODEL_DIR)
encoder = EncoderWrapper(model.model.encoder).eval()
dummy = torch.randn(1, 128, 3000)
traced = torch.jit.trace(encoder, dummy, strict=False)
mlmodel = ct.convert(
traced,
convert_to="mlprogram",
inputs=[ct.TensorType(name="logmel_data", shape=dummy.shape, dtype=np.float32)],
outputs=[ct.TensorType(name="output", dtype=np.float32)],
compute_units=ct.ComputeUnit.ALL,
minimum_deployment_target=ct.target.macOS13,
)
# Quantize weights to int8 β halves size, faster on ANE
op_config = cto.OpLinearQuantizerConfig(mode="linear_symmetric", dtype="int8", granularity="per_channel")
config = cto.OptimizationConfig(global_config=op_config)
mlmodel = cto.linear_quantize_weights(mlmodel, config=config)
mlmodel.save(OUTPUT_MLPACKAGE)
# Then compile:
# xcrun coremlcompiler compile ggml-model-encoder.mlpackage path/to/output/
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support