---
language:
- it
license: mit
tags:
- generated_from_trainer
datasets:
- facebook/voxpopuli
pipeline_tag: text-to-speech
base_model: microsoft/speecht5_tts
model-index:
- name: SpeechT5-it
  results:
  - task:
      type: text-to-speech
      name: Text to Speech
    dataset:
      name: VOXPOPULI
      type: facebook/voxpopuli
      config: it
      split: validation
      args: it
    metrics:
    - type: loss
      value: 0.46
      name: Loss
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# SpeechT5-it

This model is a fine-tuned version of [microsoft/speecht5_tts](https://huggingface.co/microsoft/speecht5_tts) on the VOXPOPULI dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4600

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 40

### Training results

| Training Loss | Epoch | Step  | Validation Loss |
|:-------------:|:-----:|:-----:|:---------------:|
| 0.5641        | 1.0   | 712   | 0.5090          |
| 0.5394        | 2.0   | 1424  | 0.4915          |
| 0.5277        | 3.0   | 2136  | 0.4819          |
| 0.5136        | 4.0   | 2848  | 0.4798          |
| 0.5109        | 5.0   | 3560  | 0.4733          |
| 0.5078        | 6.0   | 4272  | 0.4731          |
| 0.5033        | 7.0   | 4984  | 0.4692          |
| 0.5021        | 8.0   | 5696  | 0.4691          |
| 0.4984        | 9.0   | 6408  | 0.4670          |
| 0.488         | 10.0  | 7120  | 0.4641          |
| 0.491         | 11.0  | 7832  | 0.4641          |
| 0.4918        | 12.0  | 8544  | 0.4647          |
| 0.4933        | 13.0  | 9256  | 0.4622          |
| 0.499         | 14.0  | 9968  | 0.4619          |
| 0.4906        | 15.0  | 10680 | 0.4608          |
| 0.4884        | 16.0  | 11392 | 0.4622          |
| 0.4847        | 17.0  | 12104 | 0.4616          |
| 0.4916        | 18.0  | 12816 | 0.4592          |
| 0.4845        | 19.0  | 13528 | 0.4600          |
| 0.4788        | 20.0  | 14240 | 0.4594          |
| 0.4746        | 21.0  | 14952 | 0.4607          |
| 0.4875        | 22.0  | 15664 | 0.4615          |
| 0.4831        | 23.0  | 16376 | 0.4597          |
| 0.4798        | 24.0  | 17088 | 0.4595          |
| 0.4727        | 25.0  | 17800 | 0.4592          |
| 0.4736        | 26.0  | 18512 | 0.4598          |
| 0.4746        | 27.0  | 19224 | 0.4608          |
| 0.4728        | 28.0  | 19936 | 0.4589          |
| 0.4771        | 29.0  | 20648 | 0.4593          |
| 0.4743        | 30.0  | 21360 | 0.4588          |
| 0.4785        | 31.0  | 22072 | 0.4601          |
| 0.4757        | 32.0  | 22784 | 0.4597          |
| 0.4731        | 33.0  | 23496 | 0.4598          |
| 0.4746        | 34.0  | 24208 | 0.4593          |
| 0.4715        | 35.0  | 24920 | 0.4599          |
| 0.4769        | 36.0  | 25632 | 0.4622          |
| 0.4778        | 37.0  | 26344 | 0.4605          |
| 0.4798        | 38.0  | 27056 | 0.4594          |
| 0.4694        | 39.0  | 27768 | 0.4607          |
| 0.468         | 40.0  | 28480 | 0.4600          |


### Framework versions

- Transformers 4.30.0.dev0
- Pytorch 2.0.1+cu117
- Datasets 2.13.1
- Tokenizers 0.13.3