wav2vec-read_aloud / README.md
arslanarjumand's picture
arslanarjumand/wav2vec-read-aloud
138e521 verified
metadata
license: mit
base_model: facebook/w2v-bert-2.0
tags:
  - generated_from_trainer
model-index:
  - name: wav2vec-read_aloud
    results: []

wav2vec-read_aloud

This model is a fine-tuned version of facebook/w2v-bert-2.0 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 973.4864
  • Pcc Accuracy: 0.7547
  • Pcc Fluency: 0.7664
  • Pcc Total Score: 0.8143
  • Pcc Content: nan

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5.5e-05
  • train_batch_size: 2
  • eval_batch_size: 6
  • seed: 42
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 24
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.4
  • num_epochs: 15

Training results

Training Loss Epoch Step Validation Loss Pcc Accuracy Pcc Fluency Pcc Total Score Pcc Content
2390.3109 1.95 500 2342.6951 nan 0.4815 nan nan
2164.6891 3.9 1000 2318.7217 nan 0.6461 nan nan
1078.8019 5.85 1500 1029.2085 0.6188 0.7014 0.6845 nan
974.6556 7.8 2000 985.5543 0.7117 0.7355 0.7743 nan
1002.623 9.75 2500 989.1628 0.7401 0.7533 0.7995 nan
947.5643 11.7 3000 972.3806 0.7507 0.7628 0.8103 nan
995.6286 13.65 3500 973.4864 0.7547 0.7664 0.8143 nan

Framework versions

  • Transformers 4.37.0
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.15.1