Edit model card

SpeechT5 TTS

This model is a fine-tuned version of microsoft/speecht5_tts on the SDA dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4853

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 40000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.5703 1.49 1000 0.5289
0.541 2.98 2000 0.5131
0.5487 4.46 3000 0.5059
0.5232 5.95 4000 0.5011
0.5295 7.44 5000 0.4979
0.5257 8.93 6000 0.4970
0.5091 10.42 7000 0.4905
0.5141 11.9 8000 0.4893
0.5033 13.39 9000 0.4865
0.507 14.88 10000 0.4850
0.502 16.37 11000 0.4830
0.497 17.86 12000 0.4823
0.4974 19.35 13000 0.4801
0.4993 20.83 14000 0.4794
0.496 22.32 15000 0.4814
0.4845 23.81 16000 0.4780
0.4977 25.3 17000 0.4775
0.4888 26.79 18000 0.4780
0.4773 28.27 19000 0.4792
0.4914 29.76 20000 0.4817
0.4864 31.25 21000 0.4775
0.486 32.74 22000 0.4773
0.4884 34.23 23000 0.4835
0.4856 35.71 24000 0.4788
0.4814 37.2 25000 0.4811
0.4831 38.69 26000 0.4814
0.4732 40.18 27000 0.4816
0.4846 41.67 28000 0.4812
0.4731 43.15 29000 0.4843
0.4772 44.64 30000 0.4830
0.4793 46.13 31000 0.4834
0.4736 47.62 32000 0.4834
0.4798 49.11 33000 0.4826
0.4744 50.6 34000 0.4841
0.4784 52.08 35000 0.4844
0.4743 53.57 36000 0.4851
0.4779 55.06 37000 0.4854
0.4719 56.55 38000 0.4854
0.4825 58.04 39000 0.4856
0.4805 59.52 40000 0.4853

Framework versions

  • Transformers 4.30.0.dev0
  • Pytorch 2.0.1+cu117
  • Datasets 2.13.0
  • Tokenizers 0.13.3
Downloads last month
20