w11wo's picture
update model card README.md
7bfffb5
metadata
license: apache-2.0
tags:
  - automatic-speech-recognition
  - w11wo/ljspeech_phonemes
  - generated_from_trainer
metrics:
  - wer
model-index:
  - name: wav2vec2-ljspeech-gruut
    results: []

wav2vec2-ljspeech-gruut

This model is a fine-tuned version of facebook/wav2vec2-base on the W11WO/LJSPEECH_PHONEMES - NA dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0683
  • Wer: 0.0099
  • Cer: 0.0058

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 30.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer Cer
No log 1.0 348 2.2818 1.0 1.0
2.6692 2.0 696 0.2045 0.0527 0.0299
0.2225 3.0 1044 0.1162 0.0319 0.0189
0.2225 4.0 1392 0.0927 0.0235 0.0147
0.0868 5.0 1740 0.0797 0.0218 0.0143
0.0598 6.0 2088 0.0715 0.0197 0.0128
0.0598 7.0 2436 0.0652 0.0160 0.0103
0.0447 8.0 2784 0.0571 0.0152 0.0095
0.0368 9.0 3132 0.0608 0.0163 0.0112
0.0368 10.0 3480 0.0586 0.0137 0.0083
0.0303 11.0 3828 0.0641 0.0141 0.0085
0.0273 12.0 4176 0.0656 0.0131 0.0079
0.0232 13.0 4524 0.0690 0.0133 0.0082
0.0232 14.0 4872 0.0598 0.0128 0.0079
0.0189 15.0 5220 0.0671 0.0121 0.0074
0.017 16.0 5568 0.0654 0.0114 0.0069
0.017 17.0 5916 0.0751 0.0118 0.0073
0.0146 18.0 6264 0.0653 0.0112 0.0068
0.0127 19.0 6612 0.0682 0.0112 0.0069
0.0127 20.0 6960 0.0678 0.0114 0.0068
0.0114 21.0 7308 0.0656 0.0111 0.0066
0.0101 22.0 7656 0.0669 0.0109 0.0066
0.0092 23.0 8004 0.0677 0.0108 0.0065
0.0092 24.0 8352 0.0653 0.0104 0.0063
0.0088 25.0 8700 0.0673 0.0102 0.0063
0.0074 26.0 9048 0.0669 0.0105 0.0064
0.0074 27.0 9396 0.0707 0.0101 0.0061
0.0066 28.0 9744 0.0673 0.0100 0.0060
0.0058 29.0 10092 0.0689 0.0100 0.0059
0.0058 30.0 10440 0.0683 0.0099 0.0058

Framework versions

  • Transformers 4.26.0.dev0
  • Pytorch 1.10.0
  • Datasets 2.7.1
  • Tokenizers 0.13.2