metadata

license: mit
base_model: miosipof/speecht5_tts_voxpopuli_it_v2
tags:
  - generated_from_trainer
datasets:
  - audiofolder
model-index:
  - name: speecht5_tts_dysarthria_v1
    results: []

speecht5_tts_dysarthria_v1

This model is a fine-tuned version of miosipof/speecht5_tts_voxpopuli_it_v2 on the audiofolder dataset. It achieves the following results on the evaluation set:

Loss: 0.5234

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
training_steps: 500
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
1.0113	0.7042	25	0.8442
0.8	1.4085	50	0.7084
0.7291	2.1127	75	0.6323
0.6698	2.8169	100	0.5875
0.6339	3.5211	125	0.5633
0.5747	4.2254	150	0.5552
0.5837	4.9296	175	0.5436
0.5882	5.6338	200	0.5417
0.5692	6.3380	225	0.5363
0.5577	7.0423	250	0.5340
0.5411	7.7465	275	0.5323
0.5551	8.4507	300	0.5301
0.5671	9.1549	325	0.5292
0.5313	9.8592	350	0.5254
0.5546	10.5634	375	0.5246
0.5283	11.2676	400	0.5231
0.5484	11.9718	425	0.5222
0.5251	12.6761	450	0.5222
0.5443	13.3803	475	0.5223
0.5357	14.0845	500	0.5234

Framework versions

Transformers 4.43.3
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.19.1