Model Card — Whisper Large Icelandic (Spjallrómur Fine-tune)
Model Details
Model Description
This model is a fine-tuned version of
language-and-voice-lab/whisper-large-icelandic-62640-steps-967h
for Automatic Speech Recognition (ASR) in Icelandic, with a focus on
conversational and spontaneous speech. It was further fine-tuned on the
Spjallrómur corpus — an
Icelandic conversational speech dataset — to improve robustness on informal,
dialogue-style audio.
- Developed by: Páll Rúnarsson
- Funded by: Almannarómur (Language Technology Programme for Icelandic)
- Shared by: Language and Voice Laboratory, Reykjavík University
- Model type: Automatic Speech Recognition (Transformer encoder-decoder)
- Language(s): Icelandic (
is) - License: CC BY-SA 4.0 — free to use, share, and adapt, provided you give appropriate credit and distribute any derivatives under the same license.
- Fine-tuned from:
language-and-voice-lab/whisper-large-icelandic-62640-steps-967h, which is itself a fine-tune ofopenai/whisper-large
Model Sources
- Repository: (add link)
- Demo: (add link)
Uses
Direct Use
This model is intended for transcription of Icelandic speech, particularly
conversational and spontaneous speech. It can be used directly via the
🤗 Transformers pipeline API or integrated into larger ASR pipelines.
Downstream Use
The model may be fine-tuned further for domain-specific Icelandic ASR tasks (e.g., legal, broadcast, or medical transcription) where spontaneous speech patterns are common. Any derivative models must be released under the same CC BY-SA 4.0 license with appropriate attribution.
Out-of-Scope Use
- Languages other than Icelandic
- Highly technical or domain-specific jargon without additional fine-tuning
- Real-time streaming inference without appropriate latency optimisations
Bias, Risks, and Limitations
- Performance may degrade on heavily accented, dialectal, or code-switched speech (e.g., Icelandic/English or Icelandic/Danish mixing).
- The Spjallrómur training data reflects particular speaker demographics; underrepresented groups may see higher error rates.
- As with all ASR systems, proper nouns, rare words, and domain-specific terminology present challenges.
Recommendations
Users should evaluate the model on their target domain before deployment. For sensitive applications (legal, medical), human review of transcripts is strongly recommended.
How to Get Started with the Model
from transformers import pipeline
asr = pipeline(
"automatic-speech-recognition",
model="palli23/Whisper-Large-Spjallromur"
)
result = asr("audio.wav")
print(result["text"])
For more control:
import torch
from transformers import WhisperProcessor, WhisperForConditionalGeneration
MODEL_NAME = "palli23/Whisper-Large-Spjallromur"
processor = WhisperProcessor.from_pretrained(MODEL_NAME)
model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")
def transcribe(audio_path):
import librosa
audio, sr = librosa.load(audio_path, sr=16_000)
input_features = processor(
audio, sampling_rate=sr, return_tensors="pt"
).input_features.to("cuda")
with torch.no_grad():
predicted_ids = model.generate(input_features)[0]
return processor.decode(predicted_ids, skip_special_tokens=True)
print(transcribe("audio.wav"))
Training Details
Training Data
Fine-tuned on Spjallrómur, an Icelandic conversational speech corpus published via CLARIN-IS. The corpus contains spontaneous, dialogue-style speech and is distinct from the read-speech corpora (Samrómur Milljón, Málrómur) used in the base model.
The base model checkpoint used as the starting point was trained for 62,640 steps on 967 hours of Icelandic read speech from Samrómur Milljón.
Training Procedure
Starting checkpoint:
language-and-voice-lab/whisper-large-icelandic-62640-steps-967h
Training Hyperparameters
- Training regime: (add: fp16/bf16, batch size, learning rate, steps, warmup, etc.)
Speeds, Sizes, Times
- (add: GPU type, training duration, model size)
Evaluation
Testing Data
The model was evaluated on the Spjallrómur test set — a held-out partition of conversational Icelandic speech not seen during fine-tuning.
Metrics
Word Error Rate (WER) is used as the primary evaluation metric, computed after normalising both reference and hypothesis transcripts.
Results
The model was evaluated on a held-out test set from Spjallrómur.
| Model | Test WER |
|---|---|
| Base (Icelandic read-speech fine-tune) | 46.2% |
| This model (Spjallrómur conversational) | 28.6% |
This represents a 17.6 percentage point absolute reduction — a 38% relative improvement in WER on conversational Icelandic speech.
Technical Specifications
Model Architecture and Objective
Whisper Large — a transformer encoder-decoder architecture trained with a sequence-to-sequence cross-entropy objective. The architecture is unchanged from the OpenAI Whisper Large baseline; only the weights are adapted via fine-tuning on Icelandic conversational data.
Compute Infrastructure
Training was conducted at the Language and Voice Laboratory (lvl.ru.is), Reykjavík University, Iceland.
Citation
If you use this model, please cite both this work and the base model:
@misc{runarsson2025whisper,
author = {Páll Rúnarsson},
title = {Whisper Large Icelandic Fine-tuned on Spjallrómur},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/language-and-voice-lab/<your-model-id>}}
}
@inproceedings{mena2024samromur,
title = {Samr{\'o}mur Millj{\'o}n: An ASR Corpus of One Million Verified
Read Prompts in Icelandic},
author = {Mena, Carlos Daniel Hernandez and Gunnarsson, {\TH}orsteinn
Da{\dh}i and Gu{\dh}nason, J{\'o}n},
booktitle = {Proceedings of the 2024 Joint International Conference on
Computational Linguistics, Language Resources and Evaluation
(LREC-COLING 2024)},
pages = {14305--14312},
year = {2024}
}
@misc{fong2026spjallromur,
author = {Fong, Judy Y. and Borsky, Michal and Runarsson, Pall
and Hedström, Staffan and Jónsson, Ólafur Helgi
and Hólmfriðardóttir, Lára Margrét H. and Þorsteinsdóttir, Sunneva
and Eiríksdóttir, Málfriður Anna and Mollberg, David Erik
and Magnúsdóttir, Eydís Huld and Þórhallsdóttir, Ragnheiður
and Gudnason, Jon},
title = {Spjallromur 26.03 -- Icelandic Conversational Speech},
year = {2026},
publisher = {CLARIN-IS / Reykjavík University},
howpublished = {\url{http://hdl.handle.net/20.500.12537/379}}
}
Acknowledgements
This work was carried out at the Language and Voice Laboratory (lvl.ru.is) at Reykjavík University, Iceland, under the supervision of Jón Guðnason and parallel to Michal Borsky's zipformer work.
Funded by Almannarómur — the Language Technology Programme for Icelandic, managed and coordinated by Almannarómur and funded by the Icelandic Ministry of Education, Science and Culture.
Model Card Author
Páll Rúnarsson, Research Associate Language and Voice Laboratory, Reykjavík University
Model Card Contact
(add contact email or HF profile link)
- Downloads last month
- 118