voice_clone

This model is a fine-tuned version of microsoft/speecht5_tts on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.4976

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 8
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 4000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.9069	0.8734	50	0.8148
0.8979	1.7336	100	0.7481
0.8288	2.5939	150	0.7062
0.8054	3.4541	200	0.6817
0.7422	4.3144	250	0.6609
0.7022	5.1747	300	0.6083
0.6318	6.0349	350	0.5642
0.6123	6.9083	400	0.5512
0.5917	7.7686	450	0.5467
0.5866	8.6288	500	0.5385
0.581	9.4891	550	0.5322
0.5605	10.3493	600	0.5318
0.5562	11.2096	650	0.5258
0.5565	12.0699	700	0.5196
0.566	12.9432	750	0.5230
0.561	13.8035	800	0.5204
0.5486	14.6638	850	0.5171
0.554	15.5240	900	0.5192
0.5367	16.3843	950	0.5153
0.5347	17.2445	1000	0.5144
0.5373	18.1048	1050	0.5152
0.5386	18.9782	1100	0.5127
0.5421	19.8384	1150	0.5094
0.5347	20.6987	1200	0.5101
0.5415	21.5590	1250	0.5116
0.5225	22.4192	1300	0.5087
0.5222	23.2795	1350	0.5087
0.5219	24.1397	1400	0.5080
0.5134	25.0	1450	0.5050
0.5342	25.8734	1500	0.5068
0.5265	26.7336	1550	0.5064
0.5279	27.5939	1600	0.5074
0.5304	28.4541	1650	0.5061
0.5132	29.3144	1700	0.5042
0.5091	30.1747	1750	0.5048
0.5152	31.0349	1800	0.5067
0.5192	31.9083	1850	0.5030
0.5232	32.7686	1900	0.5031
0.5247	33.6288	1950	0.5033
0.5261	34.4891	2000	0.5044
0.5116	35.3493	2050	0.5049
0.5049	36.2096	2100	0.5015
0.5044	37.0699	2150	0.5012
0.5195	37.9432	2200	0.5016
0.5186	38.8035	2250	0.5014
0.5245	39.6638	2300	0.5014
0.5248	40.5240	2350	0.5028
0.4969	41.3843	2400	0.5034
0.5009	42.2445	2450	0.5055
0.5019	43.1048	2500	0.5002
0.522	43.9782	2550	0.4999
0.5187	44.8384	2600	0.5020
0.5129	45.6987	2650	0.4994
0.5182	46.5590	2700	0.5009
0.4975	47.4192	2750	0.5014
0.5024	48.2795	2800	0.4971
0.5012	49.1397	2850	0.5004
0.5032	50.0	2900	0.5030
0.517	50.8734	2950	0.5008
0.5139	51.7336	3000	0.4994
0.5107	52.5939	3050	0.5007
0.5127	53.4541	3100	0.4998
0.499	54.3144	3150	0.4975
0.4954	55.1747	3200	0.4994
0.4994	56.0349	3250	0.5000
0.5109	56.9083	3300	0.4986
0.5145	57.7686	3350	0.4994
0.5155	58.6288	3400	0.4990
0.51	59.4891	3450	0.5001
0.501	60.3493	3500	0.5003
0.484	61.2096	3550	0.4989
0.4955	62.0699	3600	0.5006
0.5147	62.9432	3650	0.4992
0.5189	63.8035	3700	0.5009
0.5014	64.6638	3750	0.4994
0.5159	65.5240	3800	0.5020
0.4942	66.3843	3850	0.4989
0.5001	67.2445	3900	0.5002
0.4902	68.1048	3950	0.4981
0.5126	68.9782	4000	0.4976

Framework versions

Transformers 4.47.0
Pytorch 2.5.1+cu121
Datasets 3.3.1
Tokenizers 0.21.0

Mehrdad-S
/

voice_clone

voice_clone

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Mehrdad-S/voice_clone

Evaluation results