File size: 2,852 Bytes

3fbfbbb
 
 
 
483122c
 
3fbfbbb
 
afa91da
 
7b531b4
 
 
3fbfbbb
b034c76
3fbfbbb
 
 
 
 
b034c76
 
3fbfbbb
 
b034c76
3fbfbbb
 
 
01c24eb
483122c
 
 
 
9f9d181
 
3fbfbbb
 
01c24eb
 
3fbfbbb
b034c76
3fbfbbb
7c1eb52
3fbfbbb
01c24eb
 
3fbfbbb
 
 
483122c
3fbfbbb
 
 
7c1eb52
3fbfbbb
 
 
483122c
 
3fbfbbb
 
 
 
 
 
 
 
01c24eb
3fbfbbb
 
 
 
01c24eb
3fbfbbb
 
 
 
 
 
 
 
 
 
01c24eb
 
 
 
3fbfbbb
 
 
 
01c24eb
 
 
483122c

---
license: apache-2.0
base_model: openai/whisper-small
tags:
- audio
- automatic-speech-recognition
metrics:
- wer
widget:
  - example_title: Sample 1
    src: sample_ar_1.mp3
  - example_title: Sample 2
    src: sample_ar_2.mp3
model-index:
- name: whisper-small-ar-v2
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: common_voice_16_1
      type: common_voice_16_1
      config: ar
      split: test
      args: ar
    metrics:
    - name: Wer
      type: wer
      value: 47.726437288634024
language:
- ar
library_name: transformers
pipeline_tag: automatic-speech-recognition
datasets:
- mozilla-foundation/common_voice_16_1
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# whisper-small-ar-v2

This model is for Arabic automatic speech recognition (ASR). It is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Arabic portion of the [mozilla-foundation/common_voice_16_1](https://huggingface.co/datasets/mozilla-foundation/common_voice_16_1) dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4007
- Wer: 47.7264

## Model description

Whisper model fine-tuned on Arabic data, following the [official tutorial](https://huggingface.co/blog/fine-tune-whisper).

## Intended uses & limitations

It is recommended to fine-tune and evaluate on your data before using it.

## Training and evaluation data

Training Data: CommonVoice (v16.1) Arabic train + validation splits  
Validation Data: CommonVoice (v16.1) Arabic test split

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 8000
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Wer      |
|:-------------:|:-----:|:----:|:---------------:|:--------:|
| 0.2742        | 0.82  | 1000 | 0.3790          | 275.2463 |
| 0.1625        | 1.65  | 2000 | 0.3353          | 228.5252 |
| 0.1002        | 2.47  | 3000 | 0.3311          | 238.8858 |
| 0.0751        | 3.3   | 4000 | 0.3354          | 158.1532 |
| 0.0601        | 4.12  | 5000 | 0.3576          | 48.9285  |
| 0.0612        | 4.95  | 6000 | 0.3575          | 47.8937  |
| 0.0383        | 5.77  | 7000 | 0.3819          | 46.9085  |
| 0.0234        | 6.6   | 8000 | 0.4007          | 47.7264  |


### Framework versions

- Transformers 4.38.1
- Pytorch 2.1.0+cu118
- Datasets 2.17.1
- Tokenizers 0.15.2