metadata
license: mit
language: fr
datasets:
- mozilla-foundation/common_voice_13_0
metrics:
- per
tags:
- audio
- automatic-speech-recognition
- speech
- phonemize
model-index:
- name: Wav2Vec2-base French finetuned for phonemes by LMSSC
results:
- task:
name: Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice v13
type: mozilla-foundation/common_voice_13_0
args: fr
metrics:
- name: Test PER on Common Voice FR 13.0 | Trained
type: per
value: 5.52
- name: Test PER on Multilingual Librispeech FR | Trained
type: per
value: 4.36
- name: Val PER on Common Voice FR 13.0 | Trained
type: per
value: 4.31
Fine-tuned French Voxpopuli v2 wav2vec2-base model for speech-to-phoneme task in French
Fine-tuned facebook/wav2vec2-base-fr-voxpopuli-v2 for French speech-to-phoneme using the train and validation splits of Common Voice v13.
Samplerate of audio
When using this model, make sure that your speech input is sampled at 16kHz.
Training procedure details
The model has been trained for 14 epochs on 4x2080 Ti GPUs using a ddp strategy and gradient-accumulation procedure (256 audios per update, corresponding roughly to 25 minutes of speech per update -> 2k updates per epoch)
Learning rate schedule : Double Tri-state schedule
- Warmup from 1e-5 for 7% of total updates
- Constant at 1e-4 for 28% of total updates
- Linear decrease to 1e-6 for 36% of total updates
- Second warmup boost to 3e-5 for 3% of total updates
- Constant at 3e-5 for 12% of total updates
- Linear decrease to 1e-7 for remaining 14% of updates
The set of hyperparameters used for training are those detailed in Annex B and Table 6 of wav2vec2 paper.