---
license: apache-2.0
base_model: openai/whisper-medium
tags:
- hf-asr-leaderboard
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_16_0
language:
- hu
widget:
- example_title: Sample 1
  src: https://huggingface.co/datasets/Hungarians/samples/resolve/main/Sample1.flac
- example_title: Sample 2
  src: https://huggingface.co/datasets/Hungarians/samples/resolve/main/Sample2.flac
metrics:
- wer
pipeline_tag: automatic-speech-recognition
model-index:
- name: Whisper Medium Hungarian
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Common Voice 16.0 - Hungarian
      type: mozilla-foundation/common_voice_16_0
      config: hu
      split: test
      args: hu
    metrics:
    - name: Wer
      type: wer
      value: 5.55
      verified: true

---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Whisper medium Hu

This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the Common Voice 16.0 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0875
- Wer Ortho: 6.6934
- Wer: 5.5500

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 6.25e-06
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_steps: 500
- training_steps: 15000
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Wer     | Wer Ortho |
|:-------------:|:-----:|:-----:|:---------------:|:-------:|:---------:|
| 0.1877        | 0.33  | 1000  | 0.2104          | 17.8832 | 20.5799   |
| 0.136         | 0.67  | 2000  | 0.1561          | 13.4717 | 16.2140   |
| 0.1117        | 1.0   | 3000  | 0.1245          | 13.4198 | 10.9487   |
| 0.0673        | 1.34  | 4000  | 0.1148          | 12.0107 | 9.7836    |
| 0.0657        | 1.67  | 5000  | 0.1006          | 10.3547 | 8.4702    |
| 0.0264        | 2.01  | 6000  | 0.0905          | 9.0931  | 7.2250    |
| 0.0284        | 2.34  | 7000  | 0.0916          | 8.7137  | 7.2221    |
| 0.0311        | 2.68  | 8000  | 0.0879          | 8.0242  | 6.6914    |
| 0.0177        | 3.01  | 9000  | 0.0841          | 7.6960  | 6.3860    |
| 0.0177        | 3.35  | 10000 | 0.0844          | 7.2173  | 6.0125    |
| 0.0126        | 3.68  | 11000 | 0.0848          | 7.2052  | 5.9739    |
| 0.0078        | 4.02  | 12000 | 0.0865          | 7.1179  | 6.0629    |
| 0.0113        | 4.35  | 13000 | 0.0863          | 6.9312  | 5.7990    |
| 0.0115        | 4.69  | 14000 | 0.0853          | 7.0185  | 5.8968    |
| 0.0071        | 5.02  | 15000 | 0.0875          | 6.6934  | 5.5500    |


### Framework versions

- Transformers 4.36.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0