--- license: mit base_model: microsoft/speecht5_tts tags: - text-to-speech datasets: - facebook/voxpopuli model-index: - name: speecht5_tts-ft-voxpopuli-it results: - task: type: text-to-speech dataset: name: facebook/voxpopuli type: facebook/voxpopuli config: it split: train args: it metrics: - name: N.A. type: N.A. value: N.A. language: - it --- # speecht5_tts-ft-voxpopuli-it This model is a fine-tuned version of [microsoft/speecht5_tts](https://huggingface.co/microsoft/speecht5_tts) on the facebook/voxpopuli dataset. It achieves the following results on the evaluation set: - Loss: 0.5126 ## Model description It uses the speaker embedding model speechbrain/spkrec-xvect-voxceleb ## Intended uses & limitations More information needed ## Training and evaluation data test_size=0.15 ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 8 - eval_batch_size: 2 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 300 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 0.6118 | 1.94 | 300 | 0.5508 | | 0.5729 | 3.89 | 600 | 0.5204 | | 0.563 | 5.83 | 900 | 0.5126 | ### Framework versions - Transformers 4.33.0 - Pytorch 1.12.1+cu116 - Datasets 2.14.4 - Tokenizers 0.12.1