Second fine-tuning try of wav2vec2-base. Results are similar to the ones reported in https://huggingface.co/facebook/wav2vec2-base-100h.

Model was trained on librispeech-clean-train.100 with following hyper-parameters:

  • 2 GPUs Titan RTX
  • Total update steps 11000
  • Batch size per GPU: 32 corresponding to a total batch size of ca. ~750 seconds
  • Adam with linear decaying learning rate with 3000 warmup steps
  • dynamic padding for batch
  • fp16
  • attention_mask was not used during training

Check: https://wandb.ai/patrickvonplaten/huggingface/runs/1yrpescx?workspace=user-patrickvonplaten

Result (WER) on Librispeech:

"clean" (% rel difference to results in paper) "other" (% rel difference to results in paper)
6.2 (-1.6%) 15.2 (-11.2%)
Downloads last month
46
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train patrickvonplaten/wav2vec2-base-100h-2nd-try