Cnam-LMSSC
/

wav2vec2-french-phonemizer

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

zinc75 commited on Nov 8, 2023

Commit

4129c9e

•

1 Parent(s): 962644d

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -40,6 +40,11 @@ Fine-tuned [facebook/wav2vec2-base-fr-voxpopuli-v2](https://huggingface.co/faceb
 When using this model, make sure that your speech input is **sampled at 16kHz**.
 ## Training procedure
 The model has been finetuned on Coommonvoice-v13 (FR) for 14 epochs on 4x2080 Ti GPUs using a ddp strategy and gradient-accumulation procedure (256 audios per update, corresponding roughly to 25 minutes of speech per update -> 2k updates per epoch)

 When using this model, make sure that your speech input is **sampled at 16kHz**.
+## Output
+As this model is specifically trained for a speech-to-phoneme task, the output is sequence of [IPA-encoded](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) words, without punctuation.
+If you don't read the phonetic alphabet fluently, you can use this excellent [IPA reader website](http://ipa-reader.xyz) to convert the transcript back to audio synthetic speech in order to check the quality of the phonetic transcription.
 ## Training procedure
 The model has been finetuned on Coommonvoice-v13 (FR) for 14 epochs on 4x2080 Ti GPUs using a ddp strategy and gradient-accumulation procedure (256 audios per update, corresponding roughly to 25 minutes of speech per update -> 2k updates per epoch)