kb-whisper-large β€” CoreML Encoder for whisper.cpp

Pre-compiled CoreML encoder for KBLab/kb-whisper-large, the Swedish Whisper model from the National Library of Sweden.

This lets you run kb-whisper-large in whisper.cpp with CoreML/ANE acceleration on Apple Silicon Macs.

Files

File Description
ggml-model-encoder.mlmodelc/ CoreML encoder with int8-quantized weights (float32 I/O). 609 MB β€” half the size of a float32 encoder, faster on ANE.

Usage

  1. Download the GGML model weights from KBLab/kb-whisper-large:

    wget https://huggingface.co/KBLab/kb-whisper-large/resolve/main/ggml-model-q5_0.bin
    # or the full-precision version:
    # wget https://huggingface.co/KBLab/kb-whisper-large/resolve/main/ggml-model.bin
    
  2. Download ggml-model-encoder.mlmodelc/ from this repo and place it in the same directory as the GGML model file:

    your-model-dir/
    β”œβ”€β”€ ggml-model-q5_0.bin            ← GGML weights
    └── ggml-model-encoder.mlmodelc/  ← this CoreML encoder
    
  3. Build whisper.cpp with CoreML support:

    git clone https://github.com/ggerganov/whisper.cpp
    cd whisper.cpp
    cmake -B build -DWHISPER_COREML=ON
    cmake --build build --config Release
    
  4. Run:

    ./build/bin/whisper-cli -m your-model-dir/ggml-model-q5_0.bin -f audio.wav -l sv
    

    whisper.cpp will automatically detect and load the CoreML encoder.

Performance tips

For maximum speed, especially on M1/M2 where the decoder is the bottleneck:

# Greedy decoding (drop beam search) β€” biggest single speedup
./build/bin/whisper-cli -m ggml-model-q5_0.bin -f audio.wav -l sv -bs 1 --best-of 1

# M1/M2: use 4 threads (performance cores only)
./build/bin/whisper-cli -m ggml-model-q5_0.bin -f audio.wav -l sv -t 4

Notes

  • Built on macOS with Apple M4 Max using coremltools and compute_units=ALL.
  • Encoder weights are int8-quantized (per-channel, linear symmetric) for faster ANE inference and smaller file size. I/O tensors remain float32 to match whisper.cpp's internal data format.
  • The key fix vs naive conversions: whisper.cpp's CoreML bridge requires the input tensor to be named logmel_data (not mel). An incorrectly named input causes silent garbage output from the encoder.
  • The encoder was generated from the main branch of KBLab/kb-whisper-large as of May 2025 (Stage 2 checkpoint). If KBLab update their weights, regenerate using the script below.

Regenerating the encoder

# make_coreml.py
import numpy as np
import torch
import coremltools as ct
import coremltools.optimize.coreml as cto
from transformers import WhisperForConditionalGeneration

class EncoderWrapper(torch.nn.Module):
    def __init__(self, encoder):
        super().__init__()
        self.encoder = encoder

    def forward(self, mel):
        return self.encoder(mel).last_hidden_state

MODEL_DIR = "path/to/kb_whisper_large"   # local HuggingFace checkout
OUTPUT_MLPACKAGE = f"{MODEL_DIR}/ggml-model-encoder.mlpackage"

model = WhisperForConditionalGeneration.from_pretrained(MODEL_DIR)
encoder = EncoderWrapper(model.model.encoder).eval()
dummy = torch.randn(1, 128, 3000)
traced = torch.jit.trace(encoder, dummy, strict=False)

mlmodel = ct.convert(
    traced,
    convert_to="mlprogram",
    inputs=[ct.TensorType(name="logmel_data", shape=dummy.shape, dtype=np.float32)],
    outputs=[ct.TensorType(name="output", dtype=np.float32)],
    compute_units=ct.ComputeUnit.ALL,
    minimum_deployment_target=ct.target.macOS13,
)

# Quantize weights to int8 β€” halves size, faster on ANE
op_config = cto.OpLinearQuantizerConfig(mode="linear_symmetric", dtype="int8", granularity="per_channel")
config = cto.OptimizationConfig(global_config=op_config)
mlmodel = cto.linear_quantize_weights(mlmodel, config=config)

mlmodel.save(OUTPUT_MLPACKAGE)

# Then compile:
# xcrun coremlcompiler compile ggml-model-encoder.mlpackage path/to/output/
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jegeblad/kb-whisper-large-coreml

Finetuned
(3)
this model