--- language: ary metrics: - wer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Moroccan Arabic dialect by Boumehdi results: - task: name: Speech Recognition type: automatic-speech-recognition metrics: - name: Test WER type: wer value: 0.09 --- # Wav2Vec2-Large-XLSR-53-Moroccan-Darija **wav2vec2-large-xlsr-53** fine-tuned on 120 hours of labeled Darija Audios ## Usage The model can be used directly as follows: ```python import librosa import torch from transformers import Wav2Vec2CTCTokenizer, Wav2Vec2ForCTC, Wav2Vec2Processor, TrainingArguments, Wav2Vec2FeatureExtractor, Trainer tokenizer = Wav2Vec2CTCTokenizer("./vocab.json", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|") processor = Wav2Vec2Processor.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija', tokenizer=tokenizer) model=Wav2Vec2ForCTC.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija') # load the audio data (use your own wav file here!) input_audio, sr = librosa.load('file.wav', sr=16000) # tokenize input_values = processor(input_audio, return_tensors="pt", padding=True).input_values # retrieve logits logits = model(input_values).logits tokens=torch.argmax(logits, axis=-1) # decode using n-gram transcription = tokenizer.batch_decode(tokens) # print the output print(transcription) ``` Here's the output: قالت ليا هاد السيد هادا ما كاينش بحالو email: souregh@gmail.com