--- language: - it license: apache-2.0 tags: - italian - sequence-to-sequence - style-transfer - efficient - formality-style-transfer datasets: - yahoo/xformal_it widget: - text: "maronn qualcuno mi spieg' CHECCOSA SUCCEDE?!?!" - text: "wellaaaaaaa, ma fraté sei proprio troppo simpatiko, grazieeee!!" - text: "nn capisco xke tt i ragazzi lo fanno" - text: "IT5 è SUPERMEGA BRAVISSIMO a capire tt il vernacolo italiano!!!" metrics: - rouge - bertscore model-index: - name: it5-efficient-small-el32-informal-to-formal results: - task: type: formality-style-transfer name: "Informal-to-formal Style Transfer" dataset: type: xformal_it name: "XFORMAL (Italian Subset)" metrics: - type: rouge1 value: 0.430 name: "Avg. Test Rouge1" - type: rouge2 value: 0.221 name: "Avg. Test Rouge2" - type: rougeL value: 0.408 name: "Avg. Test RougeL" - type: bertscore value: 0.630 name: "Avg. Test BERTScore" --- # IT5 Cased Small Efficient EL32 for Informal-to-formal Style Transfer 🧐 *Shout-out to [Stefan Schweter](https://github.com/stefan-it) for contributing the pre-trained efficient model!* This repository contains the checkpoint for the [IT5 Cased Small Efficient EL32](https://huggingface.co/it5/it5-efficient-small-el32) model fine-tuned on Informal-to-formal style transfer on the Italian subset of the XFORMAL dataset as part of the experiments of the paper [IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation](https://arxiv.org/abs/2203.03759) by [Gabriele Sarti](https://gsarti.com) and [Malvina Nissim](https://malvinanissim.github.io). Efficient IT5 models differ from the standard ones by adopting a different vocabulary that enables cased text generation and an [optimized model architecture](https://arxiv.org/abs/2109.10686) to improve performances while reducing parameter count. The Small-EL32 replaces the original encoder from the T5 Small architecture with a 32-layer deep encoder, showing improved performances over the base model. A comprehensive overview of other released materials is provided in the [gsarti/it5](https://github.com/gsarti/it5) repository. Refer to the paper for additional details concerning the reported scores and the evaluation approach. ## Using the model Model checkpoints are available for usage in Tensorflow, Pytorch and JAX. They can be used directly with pipelines as: ```python from transformers import pipelines i2f = pipeline("text2text-generation", model='it5/it5-efficient-small-el32-informal-to-formal') i2f("nn capisco xke tt i ragazzi lo fanno") >>> [{"generated_text": "non comprendo perché tutti i ragazzi agiscono così"}] ``` or loaded using autoclasses: ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("it5/it5-efficient-small-el32-informal-to-formal") model = AutoModelForSeq2SeqLM.from_pretrained("it5/it5-efficient-small-el32-informal-to-formal") ``` If you use this model in your research, please cite our work as: ```bibtex @article{sarti-nissim-2022-it5, title={{IT5}: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation}, author={Sarti, Gabriele and Nissim, Malvina}, journal={ArXiv preprint 2203.03759}, url={https://arxiv.org/abs/2203.03759}, year={2022}, month={mar} } ``` ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0003 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 10.0 ### Framework versions - Transformers 4.15.0 - Pytorch 1.10.0+cu102 - Datasets 1.17.0 - Tokenizers 0.10.3