oliverguhr's picture
added lm info
15d4440
metadata
language:
  - de
license: apache-2.0
tags:
  - automatic-speech-recognition
  - mozilla-foundation/common_voice_9_0
  - generated_from_trainer
datasets:
  - mozilla-foundation/common_voice_9_0
model-index:
  - name: wav2vec2-large-xlsr-53-german-cv9
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 9
          type: mozilla-foundation/common_voice_9_0
          args: de
        metrics:
          - name: Test WER
            type: wer
            value: 9.48066328184077
          - name: Test CER
            type: cer
            value: 1.9167347943074393
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 9
          type: mozilla-foundation/common_voice_9_0
          args: de
        metrics:
          - name: Test WER (+LM)
            type: wer
            value: 7.49027762774117
          - name: Test CER  (+LM)
            type: cer
            value: 1.9167347943074393
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 6.1
          type: common_voice
          args: de
        metrics:
          - name: Test WER
            type: wer
            value: 8.122005951166669
          - name: Test CER
            type: cer
            value: 1
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 6.1
          type: common_voice
          args: de
        metrics:
          - name: Test WER (+LM)
            type: wer
            value: 6.145318204520354
          - name: Test CER (+LM)
            type: cer
            value: 1.5247743373447677

wav2vec2-large-xlsr-53-german-cv9

This model is a fine-tuned version of ./facebook/wav2vec2-large-xlsr-53 on the MOZILLA-FOUNDATION/COMMON_VOICE_9_0 - DE dataset.

It achieves the following results on the test set:

  • CER: 2.273015898213336
  • Wer: 9.480663281840769

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 50.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Eval Wer
0.4129 1.0 3557 0.3015 0.2499
0.2121 2.0 7114 0.1596 0.1567
0.1455 3.0 10671 0.1377 0.1354
0.1436 4.0 14228 0.1301 0.1282
0.1144 5.0 17785 0.1225 0.1245
0.1219 6.0 21342 0.1254 0.1208
0.104 7.0 24899 0.1198 0.1232
0.1016 8.0 28456 0.1149 0.1174
0.1093 9.0 32013 0.1186 0.1186
0.0858 10.0 35570 0.1182 0.1164
0.102 11.0 39127 0.1191 0.1186
0.0834 12.0 42684 0.1161 0.1096
0.0916 13.0 46241 0.1147 0.1107
0.0811 14.0 49798 0.1174 0.1136
0.0814 15.0 53355 0.1132 0.1114
0.0865 16.0 56912 0.1134 0.1097
0.0701 17.0 60469 0.1096 0.1054
0.0891 18.0 64026 0.1110 0.1076
0.071 19.0 67583 0.1141 0.1074
0.0726 20.0 71140 0.1094 0.1093
0.0647 21.0 74697 0.1088 0.1095
0.0643 22.0 78254 0.1105 0.1044
0.0764 23.0 81811 0.1072 0.1042
0.0605 24.0 85368 0.1095 0.1026
0.0722 25.0 88925 0.1144 0.1066
0.0597 26.0 92482 0.1087 0.1022
0.062 27.0 96039 0.1073 0.1027
0.0536 28.0 99596 0.1068 0.1027
0.0616 29.0 103153 0.1097 0.1037
0.0642 30.0 106710 0.1117 0.1020
0.0555 31.0 110267 0.1109 0.0990
0.0632 32.0 113824 0.1104 0.0977
0.0482 33.0 117381 0.1108 0.0958
0.0601 34.0 120938 0.1095 0.0957
0.0508 35.0 124495 0.1079 0.0973
0.0526 36.0 128052 0.1068 0.0967
0.0487 37.0 131609 0.1081 0.0966
0.0495 38.0 135166 0.1099 0.0956
0.0528 39.0 138723 0.1091 0.0923
0.0439 40.0 142280 0.1111 0.0928
0.0467 41.0 145837 0.1131 0.0943
0.0407 42.0 149394 0.1115 0.0944
0.046 43.0 152951 0.1106 0.0935
0.0447 44.0 156508 0.1083 0.0919
0.0434 45.0 160065 0.1093 0.0909
0.0472 46.0 163622 0.1092 0.0921
0.0414 47.0 167179 0.1106 0.0922
0.0501 48.0 170736 0.1094 0.0918
0.0388 49.0 174293 0.1099 0.0918
0.0428 50.0 177850 0.1103 0.0915

Framework versions

  • Transformers 4.19.0.dev0
  • Pytorch 1.11.0+cu113
  • Datasets 2.0.0
  • Tokenizers 0.11.6