whisper-heb-ipa

Fine-tuned Whisper model that transcribes Hebrew speech into ASCII IPA phonemes.

Training eval (imaginary-jail 30m holdout, 235 segments)

Metric Value
WER 9.96%
CER 3.31%

Benchmark results (normalized ASCII IPA)

ILSpeech test (data/ilspeech-v2/test, 150 samples)

Metric Base (whisper-he-ipa) This model
CER 2.17% 1.73%
WER 9.55% 7.99%
SER 8.22% 6.63%
VER 1.97% 1.69%
Exact match 44.7% 46.7%

Michael Gold v1 (data/michael-gold-v1, 561 samples)

Metric Base (whisper-he-ipa) This model
CER 4.20% 4.05%
WER 19.71% 19.12%
SER 14.55% 14.98%
VER 2.79% 2.80%
Exact match 6.6% 7.7%

Full per-sample reports are in benchmarks/.

Usage

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="aunikud/whisper-heb-ipa",
    generate_kwargs={"language": "he", "task": "transcribe"},
)
print(pipe("audio.wav")["text"])

CLI from this repo:

uv run src/infer.py audio.wav --model aunikud/whisper-heb-ipa

Training data

  • data/ilspeech-v2/train
  • data/imaginary-jail-clean-v2 train split (metadata_train.csv, ~11h)

Eval during training used metadata_eval_30m.csv (30-minute holdout from the same dataset).

Downloads last month
-
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aunikud/whisper-heb-ipa

Finetuned
(1)
this model