kb-whisper-large — CoreML Encoder for whisper.cpp

Pre-compiled CoreML encoder for KBLab/kb-whisper-large, the Swedish Whisper model from the National Library of Sweden.

This lets you run kb-whisper-large in whisper.cpp with CoreML/ANE acceleration on Apple Silicon Macs.

Files

File	Description
`ggml-model-encoder.mlmodelc/`	CoreML encoder with int8-quantized weights (float32 I/O). 609 MB — half the size of a float32 encoder, faster on ANE.

Usage

Download the GGML model weights from KBLab/kb-whisper-large:

wget https://huggingface.co/KBLab/kb-whisper-large/resolve/main/ggml-model-q5_0.bin
# or the full-precision version:
# wget https://huggingface.co/KBLab/kb-whisper-large/resolve/main/ggml-model.bin

Download ggml-model-encoder.mlmodelc/ from this repo and place it in the same directory as the GGML model file:

your-model-dir/
├── ggml-model-q5_0.bin            ← GGML weights
└── ggml-model-encoder.mlmodelc/  ← this CoreML encoder

Build whisper.cpp with CoreML support:

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build -DWHISPER_COREML=ON
cmake --build build --config Release

Run:

./build/bin/whisper-cli -m your-model-dir/ggml-model-q5_0.bin -f audio.wav -l sv

whisper.cpp will automatically detect and load the CoreML encoder.

Performance tips

For maximum speed, especially on M1/M2 where the decoder is the bottleneck:

# Greedy decoding (drop beam search) — biggest single speedup
./build/bin/whisper-cli -m ggml-model-q5_0.bin -f audio.wav -l sv -bs 1 --best-of 1

# M1/M2: use 4 threads (performance cores only)
./build/bin/whisper-cli -m ggml-model-q5_0.bin -f audio.wav -l sv -t 4

Notes

Built on macOS with Apple M4 Max using coremltools and compute_units=ALL.
Encoder weights are int8-quantized (per-channel, linear symmetric) for faster ANE inference and smaller file size. I/O tensors remain float32 to match whisper.cpp's internal data format.
The key fix vs naive conversions: whisper.cpp's CoreML bridge requires the input tensor to be named logmel_data (not mel). An incorrectly named input causes silent garbage output from the encoder.
The encoder was generated from the main branch of KBLab/kb-whisper-large as of May 2025 (Stage 2 checkpoint). If KBLab update their weights, regenerate using the script below.

Regenerating the encoder

# make_coreml.py
import numpy as np
import torch
import coremltools as ct
import coremltools.optimize.coreml as cto
from transformers import WhisperForConditionalGeneration

class EncoderWrapper(torch.nn.Module):
    def __init__(self, encoder):
        super().__init__()
        self.encoder = encoder

    def forward(self, mel):
        return self.encoder(mel).last_hidden_state

MODEL_DIR = "path/to/kb_whisper_large"   # local HuggingFace checkout
OUTPUT_MLPACKAGE = f"{MODEL_DIR}/ggml-model-encoder.mlpackage"

model = WhisperForConditionalGeneration.from_pretrained(MODEL_DIR)
encoder = EncoderWrapper(model.model.encoder).eval()
dummy = torch.randn(1, 128, 3000)
traced = torch.jit.trace(encoder, dummy, strict=False)

mlmodel = ct.convert(
    traced,
    convert_to="mlprogram",
    inputs=[ct.TensorType(name="logmel_data", shape=dummy.shape, dtype=np.float32)],
    outputs=[ct.TensorType(name="output", dtype=np.float32)],
    compute_units=ct.ComputeUnit.ALL,
    minimum_deployment_target=ct.target.macOS13,
)

# Quantize weights to int8 — halves size, faster on ANE
op_config = cto.OpLinearQuantizerConfig(mode="linear_symmetric", dtype="int8", granularity="per_channel")
config = cto.OptimizationConfig(global_config=op_config)
mlmodel = cto.linear_quantize_weights(mlmodel, config=config)

mlmodel.save(OUTPUT_MLPACKAGE)

# Then compile:
# xcrun coremlcompiler compile ggml-model-encoder.mlpackage path/to/output/

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jegeblad/kb-whisper-large-coreml

Base model

openai/whisper-large-v3

Quantized

KBLab/kb-whisper-large

Finetuned

(3)

this model