boumehdi commited on
Commit
3128428
1 Parent(s): 346ca1e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -19,7 +19,7 @@ model-index:
19
  type: wer
20
  value: 49.68
21
  ---
22
- # Wav2Vec2-Large-XLSR-53-Moroccan-Darija-V1
23
 
24
  [othrif/wav2vec2-large-xlsr-moroccan](https://huggingface.co/othrif/wav2vec2-large-xlsr-moroccan) fine-tuned on 6 hours of labeled Darija Audios
25
 
@@ -35,8 +35,8 @@ import torch
35
  from transformers import Wav2Vec2CTCTokenizer, Wav2Vec2ForCTC, Wav2Vec2Processor, TrainingArguments, Wav2Vec2FeatureExtractor, Trainer
36
 
37
  tokenizer = Wav2Vec2CTCTokenizer("./vocab.json", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|")
38
- processor = Wav2Vec2Processor.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija-v1', tokenizer=tokenizer)
39
- model=Wav2Vec2ForCTC.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija-v1')
40
 
41
 
42
  # load the audio data (use your own wav file here!)
@@ -71,6 +71,6 @@ This high validation loss value is mainly due to the fact that Darija can be wri
71
 
72
  ## Future Work
73
 
74
- Currently working on **wav2vec2-large-xlsr-moroccan-darija-v2** which will be available soon by adding more data (from 6hours to 12hours).
 
75
 
76
- I am also working on audio data augmentation techniques (pitch shift, reberbation, additive augmentation.. ) to see if it is going to improve the **WER**.
 
19
  type: wer
20
  value: 49.68
21
  ---
22
+ # Wav2Vec2-Large-XLSR-53-Moroccan-Darija
23
 
24
  [othrif/wav2vec2-large-xlsr-moroccan](https://huggingface.co/othrif/wav2vec2-large-xlsr-moroccan) fine-tuned on 6 hours of labeled Darija Audios
25
 
 
35
  from transformers import Wav2Vec2CTCTokenizer, Wav2Vec2ForCTC, Wav2Vec2Processor, TrainingArguments, Wav2Vec2FeatureExtractor, Trainer
36
 
37
  tokenizer = Wav2Vec2CTCTokenizer("./vocab.json", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|")
38
+ processor = Wav2Vec2Processor.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija', tokenizer=tokenizer)
39
+ model=Wav2Vec2ForCTC.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija')
40
 
41
 
42
  # load the audio data (use your own wav file here!)
 
71
 
72
  ## Future Work
73
 
74
+ Currently working on improving this model. The new model will be available soon.
75
+
76