YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Whisper Large v3 Turbo (Albanian Fine-Tuned) - v2
This is a fine-tuned version of the Whisper Large v3 Turbo model, optimized for Albanian speech-to-text transcription. It achieves a Word Error Rate (WER) of 6.98% on a held-out evaluation set.
Model Details
- Base Model:
openai/whisper-large-v3-turbo
- Language: Albanian (
sq
)
Training Dataset
- Source: Mozilla Common Voice version 19 (available in HF as Kushtrim/common_voice_19_sq)
- Description: Audio clips ranging from 5-30 seconds, in spoken Albanian.
Training Details
The model was fine-tuned on an NVIDIA A100 GPU (40GB) using the transformers
library. Below are the key training arguments:
Argument | Value | Description |
---|---|---|
per_device_train_batch_size |
8 | Training batch size per GPU |
per_device_eval_batch_size |
2 | Evaluation batch size per GPU |
gradient_accumulation_steps |
1 | Steps to accumulate gradients (effective batch size = 8) |
num_train_epochs |
3 | Number of training epochs |
learning_rate |
1e-5 | Initial learning rate |
warmup_steps |
300 | Number of warmup steps for learning rate |
evaluation_strategy |
"steps" | Evaluate every eval_steps during training |
eval_steps |
250 | Frequency of evaluation (every 250 steps) |
fp16 |
True | Use mixed precision training (16-bit floats) |
- Total Steps: ~3,540 (completed 3,500)
- Hardware: NVIDIA A100 (40GB)
- Libraries:
- transformers==4.38.2
- torch==2.2.1
Performance
Step | Training Loss | Validation Loss | WER |
---|---|---|---|
250 | 0.4744 | 0.3991 | 34.03% |
500 | 0.3421 | 0.3426 | 30.42% |
750 | 0.2871 | 0.2808 | 26.09% |
1000 | 0.2401 | 0.2258 | 21.31% |
1250 | 0.1809 | 0.1998 | 19.15% |
1500 | 0.1142 | 0.1827 | 17.33% |
1750 | 0.1051 | 0.1611 | 15.19% |
2000 | 0.0930 | 0.1464 | 13.82% |
2250 | 0.0827 | 0.1313 | 11.79% |
2500 | 0.0420 | 0.1139 | 10.50% |
2750 | 0.0330 | 0.1124 | 9.58% |
3000 | 0.0255 | 0.1006 | 8.38% |
3250 | 0.0256 | 0.0905 | 7.48% |
3500 | 0.0204 | 0.0889 | 6.98% |
- Final WER: 6.98% (at step 3500)
- Downloads last month
- 21
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.