File size: 2,146 Bytes
361f4f3 8229830 361f4f3 3128428 361f4f3 6c8f71f eded877 7e840a2 a2f4e64 77e5d7f 9343b4d 5ee2a89 8229830 5ee2a89 37daef4 361f4f3 163eb69 361f4f3 3128428 361f4f3 f0b89f3 361f4f3 2e5002b a825c09 f0b89f3 361f4f3 bda656c dc50ea5 eded877 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
---
language: ary
metrics:
- wer
tags:
- audio
- automatic-speech-recognition
- speech
- xlsr-fine-tuning-week
license: apache-2.0
model-index:
- name: XLSR Wav2Vec2 Moroccan Arabic dialect by Boumehdi
results:
- task:
name: Speech Recognition
type: automatic-speech-recognition
metrics:
- name: Test WER
type: wer
value: 0.149084
---
# Wav2Vec2-Large-XLSR-53-Moroccan-Darija
**wav2vec2-large-xlsr-53 new model**
- Fine-tuned on 38 hours of labeled Darija Audios extracted from MDVC corpus which contains more than 1000 hours of Moroccan Darija "ary".
- Fine-tuning is ongoing 24/7 to enhance accuracy.
- We are consistently adding data to the model every day (We prefer not to add all MDVC Corpus at once as we are trying to standardize more and more the way we write this language).
<table><thead><tr><th><strong>Training Loss</strong></th> <th><strong>Validation</strong></th> <th><strong>Loss Wer</strong></th></tr></thead> <tbody><tr>
<td>0.021800</td>
<td>0.249328</td>
<td>0.149084</td>
</tr> </tbody></table>
## Usage
The model can be used directly as follows:
```python
import librosa
import torch
from transformers import Wav2Vec2CTCTokenizer, Wav2Vec2ForCTC, Wav2Vec2Processor, TrainingArguments, Wav2Vec2FeatureExtractor, Trainer
tokenizer = Wav2Vec2CTCTokenizer("./vocab.json", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|")
processor = Wav2Vec2Processor.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija', tokenizer=tokenizer)
model=Wav2Vec2ForCTC.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija')
# load the audio data (use your own wav file here!)
input_audio, sr = librosa.load('file.wav', sr=16000)
# tokenize
input_values = processor(input_audio, return_tensors="pt", padding=True).input_values
# retrieve logits
logits = model(input_values).logits
tokens = torch.argmax(logits, axis=-1)
# decode using n-gram
transcription = tokenizer.batch_decode(tokens)
# print the output
print(transcription)
```
Output: قالت ليا هاد السيد هادا ما كاينش بحالو
email: souregh@gmail.com
BOUMEHDI Ahmed
|