metadata

license: mit
base_model: microsoft/speecht5_tts
tags:
  - generated_from_trainer
datasets:
  - m-aliabbas/common_voice_urdu1
model-index:
  - name: SpeechT5 TTS urdu
    results: []

SpeechT5 TTS urdu

This model is a fine-tuned version of microsoft/speecht5_tts on the common_voice_urdu1 dataset. It achieves the following results on the evaluation set:

Loss: 0.4796

Model description

trianed using roman urdu, using a transliteration function normal urdu was mapped to roman urdu.

Use

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 300
training_steps: 10500
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.5782	4.3103	500	0.5071
0.5248	8.6207	1000	0.4863
0.5125	12.9310	1500	0.4746
0.5081	17.2414	2000	0.4727
0.4967	21.5517	2500	0.4683
0.4905	25.8621	3000	0.4645
0.4794	30.1724	3500	0.4668
0.4829	34.4828	4000	0.4647
0.477	38.7931	4500	0.4645
0.4637	43.1034	5000	0.4710
0.4743	47.4138	5500	0.4683
0.4595	51.7241	6000	0.4695
0.4735	56.0345	6500	0.4684
0.4613	60.3448	7000	0.4724
0.4678	64.6552	7500	0.4732
0.4538	68.9655	8000	0.4723
0.4536	73.2759	8500	0.4747
0.4587	77.5862	9000	0.4740
0.4536	81.8966	9500	0.4762
0.4606	86.2069	10000	0.4768
0.4528	90.5172	10500	0.4796

Framework versions

Transformers 4.43.0.dev0
Pytorch 2.3.1+cu121
Datasets 2.20.0
Tokenizers 0.19.1