wav222vec222v2-stt / README.md
Asim037's picture
Upload processor
ac3f1d8 verified
|
raw
history blame
No virus
1.96 kB
metadata
language:
  - eng
license: apache-2.0
tags:
  - '[finetuned_model, lj_speech11]'
  - generated_from_trainer
base_model: facebook/wav2vec2-base-960h
datasets:
  - FYP/LJ-SpeechLJ
model-index:
  - name: SpeechT5 STT Wav2Vec2
    results: []

SpeechT5 STT Wav2Vec2

This model is a fine-tuned version of facebook/wav2vec2-base-960h on the Lj-Speech dataset. It achieves the following results on the evaluation set:

  • Loss: 310.7488

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 5
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
3527.1737 0.5362 50 433.5859
441.3472 1.0724 100 331.3980
404.9114 1.6086 150 379.2116
330.0151 2.1448 200 319.7489
346.7672 2.6810 250 307.3814
398.5639 3.2172 300 349.2720
331.2473 3.7534 350 308.0336
311.4908 4.2895 400 306.9877
312.0619 4.8257 450 310.7488

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1