Edit model card

Kalemat-Tech Arabic Speech Recognition Model (STT) - Mohamed Salama

ู†ู…ูˆุฐุฌ ูƒู„ู…ุงุชูƒ ู„ู„ุชุนุฑู ุนู„ู‰ ุงู„ุฃุตูˆุงุช ุงู„ุนุฑุจูŠุฉ ุงู„ูุตุญู‰ ูˆ ุชุญูˆูŠู„ู‡ุง ุฅู„ู‰ ู†ุตูˆุต

KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small

This model is a fine-tuned version of openai/whisper-small on Common_Voice_Arabic_12.0_Augmented. It achieves the following results on the evaluation set:

  • Loss: 0.5362
  • Wer: 58.5848

Example of usage:

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq

processor = AutoProcessor.from_pretrained("Salama1429/KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small")

model = AutoModelForSpeechSeq2Seq.from_pretrained("Salama1429/KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small")

Intended uses & limitations

Automatic Speech Recognition

Training and evaluation data

Common_Voice_Arabic_12.0 and I made some augmentations to it as follows:
- 25% of the data TimeMasking
- 25% of the data SpecAugmentation
- 25% of the data WavAugmentation (AddGaussianNoise)
- The final dataset is the original common voice plus the augmented files

Training procedure

Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 64
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 25
- mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.2728 1.01 1000 0.3063 60.4733
0.1442 2.01 2000 0.2878 55.6935
0.0648 3.02 3000 0.3009 59.2568
0.0318 4.03 4000 0.3278 59.2993
0.0148 5.04 5000 0.3539 61.0364
0.0088 6.04 6000 0.3714 56.9154
0.0061 7.05 7000 0.3920 57.5515
0.0041 8.06 8000 0.4149 61.6328
0.0033 9.06 9000 0.4217 58.0310
0.0033 10.07 10000 0.4376 59.9594
0.0021 11.08 11000 0.4485 56.7812
0.0015 12.08 12000 0.4577 57.6936
0.0013 13.09 13000 0.4671 60.6606
0.0011 14.1 14000 0.4686 59.8159
0.0008 15.11 15000 0.4856 60.7111
0.0011 16.11 16000 0.4851 59.5198
0.0005 17.12 17000 0.4936 59.2608
0.0004 18.13 18000 0.4995 57.9619
0.0003 19.13 19000 0.5085 58.3630
0.0002 20.14 20000 0.5155 58.0987
0.0001 21.15 21000 0.5251 58.8504
0.0001 22.16 22000 0.5268 58.4228
0.0001 23.16 23000 0.5317 59.0881
0.0001 24.17 24000 0.5362 58.5848

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.13.1+cu117
  • Datasets 2.8.0
  • Tokenizers 0.13.2
Downloads last month
212

Dataset used to train Salama1429/KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small

Evaluation results