Instructions to use kinit/whisper-large-v3-sk with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kinit/whisper-large-v3-sk with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="kinit/whisper-large-v3-sk")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("kinit/whisper-large-v3-sk") model = AutoModelForSpeechSeq2Seq.from_pretrained("kinit/whisper-large-v3-sk") - Notebooks
- Google Colab
- Kaggle
whisper-large-v3-sk
Slovak fine-tune of openai/whisper-large-v3 by the KInIT team. Full parameter fine-tuning on a curated Slovak speech corpus.
Model Details
| Property | Value |
|---|---|
| Base model | openai/whisper-large-v3 |
| Parameters | ~1.55B |
| Architecture | Whisper encoder-decoder |
| Fine-tuning method | Full fine-tuning |
| Language | Slovak (sk) |
| Task | Automatic Speech Recognition |
| License | MIT |
Intended Use
This model is intended for Slovak automatic speech recognition across a range of domains and recording conditions.
Out-of-scope: Non-Slovak audio, real-time streaming without appropriate chunking, safety-critical transcription without human review.
Training Data
Fine-tuned on an internal curated Slovak speech corpus compiled at KInIT. The corpus combines public datasets with internal KInIT recordings. Recordings containing personal data were anonymised prior to use. Samples were quality-filtered using a CER-based threshold validated against multiple ASR models.
| Data sources |
|---|
| SloPalSpeech |
| Municipal council session recordings |
| Read literature |
| Mozilla Common Voice |
| TEDxSK and JumpSK Lecture Speech Corpus |
| FLEURS read speech |
| Internal KInIT recordings |
Training Procedure
| Hyperparameter | Value |
|---|---|
| Epochs | 2 |
| Learning rate | 5e-5 |
| LR scheduler | Linear with warmup |
| Optimizer | AdamW |
| Effective batch size | 64 |
| Precision | fp16 |
| Framework | HuggingFace Transformers Seq2SeqTrainer |
Training was performed on the Devana HPC cluster.
Evaluation
| Model | Common Voice 24 SK (test) | Internal eval | ||
|---|---|---|---|---|
| WER โ | CER โ | WER โ | CER โ | |
| openai/whisper-large-v3 | TBD | TBD | TBD | TBD |
| kinit/whisper-large-v3-sk | TBD | TBD | TBD | TBD |
| openai/whisper-large-v3-turbo | TBD | TBD | TBD | TBD |
| kinit/whisper-large-v3-turbo-sk | TBD | TBD | TBD | TBD |
Usage
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
model_id = "kinit/whisper-large-v3-sk"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id,
dtype=dtype,
use_safetensors=True,
).to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
dtype=dtype,
device=device,
)
result = pipe("audio.wav", generate_kwargs={"language": "slovak"})
print(result["text"])
Limitations
- Catastrophic forgetting: Fine-tuning exclusively on Slovak data significantly degrades performance on other languages. Use the base openai/whisper-large-v3 if multilingual transcription is required.
- Performance may degrade on strongly accented, dialectal, or domain-specific speech not represented in the training data.
- Maximum reliable single-segment length is 30 seconds without chunking.
Acknowledgements
Public datasets used in training: SloPalSpeech, Mozilla Common Voice, FLEURS, and the TEDxSK and JumpSK Lecture Speech Corpus (KEMT NLP).
(Part of the) Research results was obtained using the computational resources procured in the national project National competence centre for high performance computing (project code: 311070AKF2) funded by European Regional Development Fund, EU Structural Funds Informatization of society, Operational Program Integrated Infrastructure.
- Downloads last month
- 37
Model tree for kinit/whisper-large-v3-sk
Base model
openai/whisper-large-v3