voice_clone

This model is a fine-tuned version of microsoft/speecht5_tts on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4976

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 8
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.9069 0.8734 50 0.8148
0.8979 1.7336 100 0.7481
0.8288 2.5939 150 0.7062
0.8054 3.4541 200 0.6817
0.7422 4.3144 250 0.6609
0.7022 5.1747 300 0.6083
0.6318 6.0349 350 0.5642
0.6123 6.9083 400 0.5512
0.5917 7.7686 450 0.5467
0.5866 8.6288 500 0.5385
0.581 9.4891 550 0.5322
0.5605 10.3493 600 0.5318
0.5562 11.2096 650 0.5258
0.5565 12.0699 700 0.5196
0.566 12.9432 750 0.5230
0.561 13.8035 800 0.5204
0.5486 14.6638 850 0.5171
0.554 15.5240 900 0.5192
0.5367 16.3843 950 0.5153
0.5347 17.2445 1000 0.5144
0.5373 18.1048 1050 0.5152
0.5386 18.9782 1100 0.5127
0.5421 19.8384 1150 0.5094
0.5347 20.6987 1200 0.5101
0.5415 21.5590 1250 0.5116
0.5225 22.4192 1300 0.5087
0.5222 23.2795 1350 0.5087
0.5219 24.1397 1400 0.5080
0.5134 25.0 1450 0.5050
0.5342 25.8734 1500 0.5068
0.5265 26.7336 1550 0.5064
0.5279 27.5939 1600 0.5074
0.5304 28.4541 1650 0.5061
0.5132 29.3144 1700 0.5042
0.5091 30.1747 1750 0.5048
0.5152 31.0349 1800 0.5067
0.5192 31.9083 1850 0.5030
0.5232 32.7686 1900 0.5031
0.5247 33.6288 1950 0.5033
0.5261 34.4891 2000 0.5044
0.5116 35.3493 2050 0.5049
0.5049 36.2096 2100 0.5015
0.5044 37.0699 2150 0.5012
0.5195 37.9432 2200 0.5016
0.5186 38.8035 2250 0.5014
0.5245 39.6638 2300 0.5014
0.5248 40.5240 2350 0.5028
0.4969 41.3843 2400 0.5034
0.5009 42.2445 2450 0.5055
0.5019 43.1048 2500 0.5002
0.522 43.9782 2550 0.4999
0.5187 44.8384 2600 0.5020
0.5129 45.6987 2650 0.4994
0.5182 46.5590 2700 0.5009
0.4975 47.4192 2750 0.5014
0.5024 48.2795 2800 0.4971
0.5012 49.1397 2850 0.5004
0.5032 50.0 2900 0.5030
0.517 50.8734 2950 0.5008
0.5139 51.7336 3000 0.4994
0.5107 52.5939 3050 0.5007
0.5127 53.4541 3100 0.4998
0.499 54.3144 3150 0.4975
0.4954 55.1747 3200 0.4994
0.4994 56.0349 3250 0.5000
0.5109 56.9083 3300 0.4986
0.5145 57.7686 3350 0.4994
0.5155 58.6288 3400 0.4990
0.51 59.4891 3450 0.5001
0.501 60.3493 3500 0.5003
0.484 61.2096 3550 0.4989
0.4955 62.0699 3600 0.5006
0.5147 62.9432 3650 0.4992
0.5189 63.8035 3700 0.5009
0.5014 64.6638 3750 0.4994
0.5159 65.5240 3800 0.5020
0.4942 66.3843 3850 0.4989
0.5001 67.2445 3900 0.5002
0.4902 68.1048 3950 0.4981
0.5126 68.9782 4000 0.4976

Framework versions

  • Transformers 4.47.0
  • Pytorch 2.5.1+cu121
  • Datasets 3.3.1
  • Tokenizers 0.21.0
Downloads last month
26
Safetensors
Model size
144M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for Mehrdad-S/voice_clone

Finetuned
(993)
this model