Edit model card

microsoft/speecht5_tts_hu

This model is a fine-tuned version of microsoft/speecht5_tts on the VoxPopuli dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4309

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 4000

Training results

Training Loss Epoch Step Validation Loss
0.6932 0.54 100 0.6017
0.6325 1.07 200 0.5632
0.5817 1.61 300 0.5078
0.5326 2.15 400 0.4830
0.5247 2.69 500 0.4703
0.5094 3.22 600 0.4630
0.5023 3.76 700 0.4568
0.4997 4.3 800 0.4541
0.4974 4.84 900 0.4504
0.4915 5.37 1000 0.4495
0.4885 5.91 1100 0.4475
0.4779 6.45 1200 0.4437
0.484 6.98 1300 0.4439
0.4799 7.52 1400 0.4419
0.4783 8.06 1500 0.4410
0.4764 8.6 1600 0.4401
0.4757 9.13 1700 0.4396
0.4742 9.67 1800 0.4378
0.4713 10.21 1900 0.4363
0.4747 10.75 2000 0.4370
0.4719 11.28 2100 0.4356
0.4694 11.82 2200 0.4349
0.4706 12.36 2300 0.4345
0.4757 12.89 2400 0.4341
0.466 13.43 2500 0.4334
0.4648 13.97 2600 0.4332
0.4663 14.51 2700 0.4329
0.4644 15.04 2800 0.4323
0.4646 15.58 2900 0.4324
0.4641 16.12 3000 0.4319
0.4644 16.66 3100 0.4316
0.463 17.19 3200 0.4312
0.4651 17.73 3300 0.4317
0.4637 18.27 3400 0.4315
0.4585 18.8 3500 0.4308
0.4605 19.34 3600 0.4310
0.4586 19.88 3700 0.4301
0.4636 20.42 3800 0.4308
0.4616 20.95 3900 0.4308
0.4593 21.49 4000 0.4309

Framework versions

  • Transformers 4.34.1
  • Pytorch 2.1.0+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
7

Finetuned from

Dataset used to train bayerasif/speecht5_finetuned_voxpopuli_hu

Evaluation results