File size: 2,852 Bytes
3fbfbbb 483122c 3fbfbbb afa91da 7b531b4 3fbfbbb b034c76 3fbfbbb b034c76 3fbfbbb b034c76 3fbfbbb 01c24eb 483122c 9f9d181 3fbfbbb 01c24eb 3fbfbbb b034c76 3fbfbbb 7c1eb52 3fbfbbb 01c24eb 3fbfbbb 483122c 3fbfbbb 7c1eb52 3fbfbbb 483122c 3fbfbbb 01c24eb 3fbfbbb 01c24eb 3fbfbbb 01c24eb 3fbfbbb 01c24eb 483122c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
---
license: apache-2.0
base_model: openai/whisper-small
tags:
- audio
- automatic-speech-recognition
metrics:
- wer
widget:
- example_title: Sample 1
src: sample_ar_1.mp3
- example_title: Sample 2
src: sample_ar_2.mp3
model-index:
- name: whisper-small-ar-v2
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: common_voice_16_1
type: common_voice_16_1
config: ar
split: test
args: ar
metrics:
- name: Wer
type: wer
value: 47.726437288634024
language:
- ar
library_name: transformers
pipeline_tag: automatic-speech-recognition
datasets:
- mozilla-foundation/common_voice_16_1
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# whisper-small-ar-v2
This model is for Arabic automatic speech recognition (ASR). It is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Arabic portion of the [mozilla-foundation/common_voice_16_1](https://huggingface.co/datasets/mozilla-foundation/common_voice_16_1) dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4007
- Wer: 47.7264
## Model description
Whisper model fine-tuned on Arabic data, following the [official tutorial](https://huggingface.co/blog/fine-tune-whisper).
## Intended uses & limitations
It is recommended to fine-tune and evaluate on your data before using it.
## Training and evaluation data
Training Data: CommonVoice (v16.1) Arabic train + validation splits
Validation Data: CommonVoice (v16.1) Arabic test split
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 8000
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer |
|:-------------:|:-----:|:----:|:---------------:|:--------:|
| 0.2742 | 0.82 | 1000 | 0.3790 | 275.2463 |
| 0.1625 | 1.65 | 2000 | 0.3353 | 228.5252 |
| 0.1002 | 2.47 | 3000 | 0.3311 | 238.8858 |
| 0.0751 | 3.3 | 4000 | 0.3354 | 158.1532 |
| 0.0601 | 4.12 | 5000 | 0.3576 | 48.9285 |
| 0.0612 | 4.95 | 6000 | 0.3575 | 47.8937 |
| 0.0383 | 5.77 | 7000 | 0.3819 | 46.9085 |
| 0.0234 | 6.6 | 8000 | 0.4007 | 47.7264 |
### Framework versions
- Transformers 4.38.1
- Pytorch 2.1.0+cu118
- Datasets 2.17.1
- Tokenizers 0.15.2 |