nickmuchi's picture
Update README.md
7efe713
metadata
license: apache-2.0
tags:
  - generated_from_trainer
datasets:
  - kensho/spgispeech
widget:
  - example_title: Finance Speech
    src: https://drive.google.com/uc?id=151bzDnN_f0Dfjjrg36nI97tXM39t5Ka8
model-index:
  - name: wav2vec2-base-finetuned-spgispeech-dev
    results: []

wav2vec2-base-finetuned-spgispeech-dev

This model is a fine-tuned version of facebook/wav2vec2-base on the kensho/spgispeech dev dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2897
  • Wer: 0.1508

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
1.8285 2.22 1500 0.3361 0.2754
0.2582 4.44 3000 0.2643 0.2205
0.1697 6.66 4500 0.2467 0.2006
0.1314 8.88 6000 0.2711 0.1927
0.1084 11.09 7500 0.2521 0.1872
0.0922 13.31 9000 0.2588 0.1827
0.0818 15.53 10500 0.2572 0.1783
0.0712 17.75 12000 0.2720 0.1766
0.067 19.97 13500 0.2873 0.1751
0.0594 22.19 15000 0.2753 0.1704
0.0546 24.41 16500 0.2794 0.1694
0.0505 26.63 18000 0.2811 0.1665
0.0467 28.85 19500 0.2906 0.1657
0.0417 31.07 21000 0.3043 0.1661
0.0395 33.28 22500 0.3068 0.1627
0.0368 35.5 24000 0.3096 0.1617
0.0334 37.72 25500 0.3036 0.1581
0.0322 39.94 27000 0.2819 0.1564
0.0286 42.16 28500 0.2936 0.1544
0.0279 44.38 30000 0.2914 0.1534
0.0264 46.6 31500 0.2957 0.1519
0.0241 48.82 33000 0.2897 0.1508

Framework versions

  • Transformers 4.17.0
  • Pytorch 1.12.1+cu113
  • Datasets 2.4.0
  • Tokenizers 0.12.1