zinc75 commited on
Commit
be28782
1 Parent(s): f98d2fd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -1
README.md CHANGED
@@ -91,4 +91,63 @@ df[['transcription']]
91
  | ./test_relecture_texte.wav | ʃapitʁ di də abɛse pəti kɔ̃t də ʒyl ləmɛtʁ ɑ̃ʁʒistʁe puʁ libʁivɔksɔʁɡ ibis dɑ̃ la bas kuʁ dœ̃ ʃato sə tʁuva paʁmi tut sɔʁt də volaj œ̃n ibis ʁɔz |
92
  | ./10179_11051_000021.flac | kɛl dɔmaʒ kə sə nə swa pa dy sykʁ supiʁa se foʁaz ɑ̃ pasɑ̃ sa lɑ̃ɡ syʁ la vitʁ fɛ̃ dy ʃapitʁ kɛ̃z ɑ̃ʁʒistʁe paʁ sonjɛ̃ sɛt ɑ̃ʁʒistʁəmɑ̃ fɛ paʁti dy domɛn pyblik |
93
 
94
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  | ./test_relecture_texte.wav | ʃapitʁ di də abɛse pəti kɔ̃t də ʒyl ləmɛtʁ ɑ̃ʁʒistʁe puʁ libʁivɔksɔʁɡ ibis dɑ̃ la bas kuʁ dœ̃ ʃato sə tʁuva paʁmi tut sɔʁt də volaj œ̃n ibis ʁɔz |
92
  | ./10179_11051_000021.flac | kɛl dɔmaʒ kə sə nə swa pa dy sykʁ supiʁa se foʁaz ɑ̃ pasɑ̃ sa lɑ̃ɡ syʁ la vitʁ fɛ̃ dy ʃapitʁ kɛ̃z ɑ̃ʁʒistʁe paʁ sonjɛ̃ sɛt ɑ̃ʁʒistʁəmɑ̃ fɛ paʁti dy domɛn pyblik |
93
 
94
+ ## Inference script (if you do not want to use Huggingsound) :
95
+
96
+ ```python
97
+ import torch
98
+ from transformers import AutoModelForCTC, Wav2Vec2Processor
99
+ from datasets import load_dataset
100
+ import soundfile as sf # Or Librosa if you prefer to ...
101
+
102
+ MODEL_ID = "Cnam-LMSSC/wav2vec2-french-phonemizer"
103
+
104
+ model = AutoModelForCTC.from_pretrained(MODEL_ID)
105
+ processor = Wav2Vec2Processor.from_pretrained(MODEL_ID)
106
+
107
+ audio = sf.read('example.wav')
108
+ # Make sure you have a 16 kHz sampled audio file, or resample it !
109
+
110
+ inputs = processor(np.array(audio[0]),sampling_rate=16_000., return_tensors="pt")
111
+
112
+ with torch.no_grad():
113
+ logits = model(**inputs).logits
114
+
115
+ predicted_ids = torch.argmax(logits,dim = -1)
116
+ transcription = processor.batch_decode(predicted_ids)
117
+
118
+ print("Phonetic transcription : ", transcription)
119
+ ```
120
+
121
+ **Output** :
122
+
123
+ 'ʒə syi tʁɛ kɔ̃tɑ̃ də vu pʁezɑ̃te notʁ solysjɔ̃ puʁ fonomize dez odjo fasilmɑ̃ sa fɔ̃ksjɔn kɑ̃ mɛm tʁɛ bjɛ̃'
124
+
125
+ ##Test Results##:
126
+
127
+ In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-06-17). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used.
128
+
129
+ | Model | WER | CER |
130
+ | ------------- | ------------- | ------------- |
131
+ | jonatasgrosman/wav2vec2-large-xlsr-53-english | **18.98%** | **8.29%** |
132
+ | jonatasgrosman/wav2vec2-large-english | 21.53% | 9.66% |
133
+ | facebook/wav2vec2-large-960h-lv60-self | 22.03% | 10.39% |
134
+ | facebook/wav2vec2-large-960h-lv60 | 23.97% | 11.14% |
135
+ | boris/xlsr-en-punctuation | 29.10% | 10.75% |
136
+ | facebook/wav2vec2-large-960h | 32.79% | 16.03% |
137
+ | facebook/wav2vec2-base-960h | 39.86% | 19.89% |
138
+ | facebook/wav2vec2-base-100h | 51.06% | 25.06% |
139
+ | elgeish/wav2vec2-large-lv60-timit-asr | 59.96% | 34.28% |
140
+ | facebook/wav2vec2-base-10k-voxpopuli-ft-en | 66.41% | 36.76% |
141
+ | elgeish/wav2vec2-base-timit-asr | 68.78% | 36.81% |
142
+
143
+ ## Citation
144
+ If you want to cite this model you can use this:
145
+
146
+ ```bibtex
147
+ @misc{lmssc-wav2vec2-base-phonemizer-french,
148
+ title={Fine-tuned wav2vec2 base model for speech to phoneme in {F}rench},
149
+ author={Malo, Olivier and Julien, Hauret and {\'E}ric, Bavu},
150
+ howpublished={\url{https://huggingface.co/Cnam-LMSSC/wav2vec2-french-phonemizer}},
151
+ year={2023}
152
+ }
153
+ ```