oliverguhr's picture
set license
62829c3
metadata
language:
  - de
license: mit
tags:
  - automatic-speech-recognition
  - mozilla-foundation/common_voice_9_0
  - generated_from_trainer
datasets:
  - mozilla-foundation/common_voice_9_0
model-index:
  - name: wav2vec2-base-german-cv9
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 6.1
          type: common_voice
          args: de
        metrics:
          - name: Test WER
            type: wer
            value: 10.565782902002717
          - name: Test CER
            type: cer
            value: 2.622682485295966
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 6.1
          type: common_voice
          args: de
        metrics:
          - name: Test WER (+LM)
            type: wer
            value: 7.996088831362508
          - name: Test CER (+LM)
            type: cer
            value: 2.151571771162333

wav2vec2-base-german-cv9

This model is a fine-tuned version of facebook/wav2vec2-base on the MOZILLA-FOUNDATION/COMMON_VOICE_9_0 - DE dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1742
  • Wer: 0.1209

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 50.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.6827 1.0 3557 0.6695 0.6247
0.3992 2.0 7114 0.3738 0.3936
0.2611 3.0 10671 0.3011 0.3177
0.2536 4.0 14228 0.2672 0.2749
0.1943 5.0 17785 0.2487 0.2480
0.2004 6.0 21342 0.2246 0.2268
0.1605 7.0 24899 0.2176 0.2120
0.1579 8.0 28456 0.2046 0.2024
0.1668 9.0 32013 0.2027 0.1944
0.1338 10.0 35570 0.1968 0.1854
0.1478 11.0 39127 0.1963 0.1823
0.1177 12.0 42684 0.1956 0.1800
0.1245 13.0 46241 0.1889 0.1732
0.1124 14.0 49798 0.1868 0.1714
0.1112 15.0 53355 0.1805 0.1650
0.1209 16.0 56912 0.1860 0.1614
0.1002 17.0 60469 0.1828 0.1604
0.118 18.0 64026 0.1832 0.1580
0.0974 19.0 67583 0.1771 0.1555
0.1007 20.0 71140 0.1812 0.1532
0.0866 21.0 74697 0.1752 0.1504
0.0901 22.0 78254 0.1690 0.1477
0.0964 23.0 81811 0.1773 0.1489
0.085 24.0 85368 0.1776 0.1456
0.0945 25.0 88925 0.1786 0.1428
0.0804 26.0 92482 0.1737 0.1429
0.0832 27.0 96039 0.1789 0.1394
0.0683 28.0 99596 0.1741 0.1390
0.0761 29.0 103153 0.1688 0.1379
0.0833 30.0 106710 0.1726 0.1370
0.0753 31.0 110267 0.1774 0.1353
0.08 32.0 113824 0.1734 0.1344
0.0644 33.0 117381 0.1737 0.1334
0.0745 34.0 120938 0.1763 0.1335
0.0629 35.0 124495 0.1761 0.1311
0.0654 36.0 128052 0.1718 0.1302
0.0656 37.0 131609 0.1697 0.1301
0.0643 38.0 135166 0.1716 0.1279
0.0683 39.0 138723 0.1777 0.1279
0.0587 40.0 142280 0.1735 0.1271
0.0693 41.0 145837 0.1780 0.1260
0.0532 42.0 149394 0.1724 0.1245
0.0594 43.0 152951 0.1736 0.1250
0.0544 44.0 156508 0.1744 0.1238
0.0559 45.0 160065 0.1770 0.1232
0.0557 46.0 163622 0.1766 0.1231
0.0521 47.0 167179 0.1751 0.1220
0.0591 48.0 170736 0.1724 0.1217
0.0507 49.0 174293 0.1753 0.1212
0.0577 50.0 177850 0.1742 0.1209

Framework versions

  • Transformers 4.20.1
  • Pytorch 1.11.0+cu113
  • Datasets 2.0.0
  • Tokenizers 0.11.6