whisper-heb-ipa

Fine-tuned Whisper model that transcribes Hebrew speech into ASCII IPA phonemes.

Base model: aunikud/whisper-he-ipa
Training precision: fp32
Best checkpoint: epoch 4, global step 100
Selection metric: lowest eval WER on a 30-minute holdout from imaginary-jail-clean-v2

Training eval (imaginary-jail 30m holdout, 235 segments)

Metric	Value
WER	9.96%
CER	3.31%

Benchmark results (normalized ASCII IPA)

ILSpeech test (`data/ilspeech-v2/test`, 150 samples)

Metric	Base (`whisper-he-ipa`)	This model
CER	2.17%	1.73%
WER	9.55%	7.99%
SER	8.22%	6.63%
VER	1.97%	1.69%
Exact match	44.7%	46.7%

Michael Gold v1 (`data/michael-gold-v1`, 561 samples)

Metric	Base (`whisper-he-ipa`)	This model
CER	4.20%	4.05%
WER	19.71%	19.12%
SER	14.55%	14.98%
VER	2.79%	2.80%
Exact match	6.6%	7.7%

Full per-sample reports are in benchmarks/.

Usage

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="aunikud/whisper-heb-ipa",
    generate_kwargs={"language": "he", "task": "transcribe"},
)
print(pipe("audio.wav")["text"])

CLI from this repo:

uv run src/infer.py audio.wav --model aunikud/whisper-heb-ipa

Training data

data/ilspeech-v2/train
data/imaginary-jail-clean-v2 train split (metadata_train.csv, ~11h)

Eval during training used metadata_eval_30m.csv (30-minute holdout from the same dataset).

Downloads last month: -

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for aunikud/whisper-heb-ipa

Base model

aunikud/whisper-he-ipa

Finetuned

(1)

this model