---
language:
- it
license: apache-2.0
tags:
- italian
- sequence-to-sequence
- style-transfer
- efficient
- formality-style-transfer
datasets:
- yahoo/xformal_it
widget:
- text: "maronn qualcuno mi spieg' CHECCOSA SUCCEDE?!?!"
- text: "wellaaaaaaa, ma fraté sei proprio troppo simpatiko, grazieeee!!"
- text: "nn capisco xke tt i ragazzi lo fanno"
- text: "IT5 è SUPERMEGA BRAVISSIMO a capire tt il vernacolo italiano!!!"
metrics:
- rouge
- bertscore
model-index:
- name: it5-efficient-small-el32-informal-to-formal
  results:
  - task: 
      type: formality-style-transfer
      name: "Informal-to-formal Style Transfer"
    dataset:
      type: xformal_it
      name: "XFORMAL (Italian Subset)"
    metrics:
      - type: rouge1
        value: 0.430
        name: "Avg. Test Rouge1"
      - type: rouge2
        value: 0.221
        name: "Avg. Test Rouge2"
      - type: rougeL
        value: 0.408
        name: "Avg. Test RougeL"
      - type: bertscore
        value: 0.630
        name: "Avg. Test BERTScore"
---

# IT5 Cased Small Efficient EL32 for Informal-to-formal Style Transfer 🧐

*Shout-out to [Stefan Schweter](https://github.com/stefan-it) for contributing the pre-trained efficient model!*

This repository contains the checkpoint for the [IT5 Cased Small Efficient EL32](https://huggingface.co/it5/it5-efficient-small-el32) model fine-tuned on Informal-to-formal style transfer on the Italian subset of the XFORMAL dataset as part of the experiments of the paper [IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation](https://arxiv.org/abs/2203.03759) by [Gabriele Sarti](https://gsarti.com) and [Malvina Nissim](https://malvinanissim.github.io). 

Efficient IT5 models differ from the standard ones by adopting a different vocabulary that enables cased text generation and an [optimized model architecture](https://arxiv.org/abs/2109.10686) to improve performances while reducing parameter count. The Small-EL32 replaces the original encoder from the T5 Small architecture with a 32-layer deep encoder, showing improved performances over the base model.

A comprehensive overview of other released materials is provided in the [gsarti/it5](https://github.com/gsarti/it5) repository. Refer to the paper for additional details concerning the reported scores and the evaluation approach.

## Using the model

Model checkpoints are available for usage in Tensorflow, Pytorch and JAX. They can be used directly with pipelines as:

```python
from transformers import pipelines

i2f = pipeline("text2text-generation", model='it5/it5-efficient-small-el32-informal-to-formal')
i2f("nn capisco xke tt i ragazzi lo fanno")
>>> [{"generated_text": "non comprendo perché tutti i ragazzi agiscono così"}]
```

or loaded using autoclasses:

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("it5/it5-efficient-small-el32-informal-to-formal")
model = AutoModelForSeq2SeqLM.from_pretrained("it5/it5-efficient-small-el32-informal-to-formal")
```

If you use this model in your research, please cite our work as:

```bibtex
@article{sarti-nissim-2022-it5,
    title={{IT5}: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation},
    author={Sarti, Gabriele and Nissim, Malvina},
    journal={ArXiv preprint 2203.03759},
    url={https://arxiv.org/abs/2203.03759},
    year={2022},
	month={mar}
}
```

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10.0

### Framework versions

- Transformers 4.15.0
- Pytorch 1.10.0+cu102
- Datasets 1.17.0
- Tokenizers 0.10.3