wav222vec222v2-stt / README.md
Asim037's picture
Upload processor
dbcde7e verified
|
raw
history blame
No virus
2.17 kB
metadata
language:
  - eng
license: apache-2.0
tags:
  - '[finetuned_model, lj_speech11]'
  - generated_from_trainer
base_model: facebook/wav2vec2-base-960h
datasets:
  - FYP/LJ-SpeechLJ
model-index:
  - name: SpeechT5 STT Wav2Vec2
    results: []

SpeechT5 STT Wav2Vec2

This model is a fine-tuned version of facebook/wav2vec2-base-960h on the Lj-Speech dataset. It achieves the following results on the evaluation set:

  • Loss: 491.2500

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 5
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
494.9709 0.3795 50 486.0701
534.2352 0.7590 100 488.7172
816.0739 1.1385 150 490.4418
566.1295 1.5180 200 504.5211
586.0909 1.8975 250 489.5141
601.5043 2.2770 300 486.6875
487.8737 2.6565 350 489.5807
1145.4591 3.0361 400 511.4276
686.6008 3.4156 450 496.0722
664.612 3.7951 500 486.9992
630.4309 4.1746 550 500.0555
513.7977 4.5541 600 488.6891
494.3428 4.9336 650 491.2500

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1