--- language: - ar license: apache-2.0 tags: - automatic-speech-recognition - robust-speech-event datasets: - mozilla-foundation/common_voice_8_0 metrics: - wer - cer model-index: - name: Sinai Voice Arabic Speech Recognition Model results: - task: type: automatic-speech-recognition name: Speech Recognition dataset: type: mozilla-foundation/common_voice_8_0 name: Common Voice ar args: ar metrics: - type: wer # Required. Example: wer value: 0.18 # Required. Example: 20.90 name: Test WER # Optional. Example: Test WER - type: cer # Required. Example: wer value: 0.051 # Required. Example: 20.90 name: Test CER # Optional. Example: Test WER WER: 0.18855042016806722 CER: 0.05138746531806014 --- # Sinai Voice Arabic Speech Recognition Model # نموذج **صوت سيناء** للتعرف على الأصوات العربية الفصحى و تحويلها إلى نصوص This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice 8 dataset. It achieves the following results on the evaluation set: - Loss: 0.22 - Wer: 0.189 - Cer: 0.051 #### Evaluation Commands 1. To evaluate on `mozilla-foundation/common_voice_8_0` with split `test` ```bash python eval.py --model_id bakrianoo/sinai-voice-ar-stt --dataset mozilla-foundation/common_voice_8_0 --config ar --split test ``` ### Inference Without LM ```python from transformers import (Wav2Vec2Processor, Wav2Vec2ForCTC) import torchaudio import torch def speech_file_to_array_fn(voice_path, resampling_to=16000): speech_array, sampling_rate = torchaudio.load(voice_path) resampler = torchaudio.transforms.Resample(sampling_rate, resampling_to) return resampler(speech_array)[0].numpy(), sampling_rate # load the model cp = "bakrianoo/sinai-voice-ar-stt" processor = Wav2Vec2Processor.from_pretrained(cp) model = Wav2Vec2ForCTC.from_pretrained(cp) # recognize the text in a sample sound file sound_path = './my_voice.mp3' sample, sr = speech_file_to_array_fn(sound_path) inputs = processor([sample], sampling_rate=16_000, return_tensors="pt", padding=True) with torch.no_grad(): logits = model(inputs.input_values,).logits predicted_ids = torch.argmax(logits, dim=-1) print("Prediction:", processor.batch_decode(predicted_ids)) ``` ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 32 - eval_batch_size: 10 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 1000 - num_epochs: 8.32 - mixed_precision_training: Native AMP