File size: 3,828 Bytes
33f2655
e3aa1b1
 
 
525e3b8
e3aa1b1
 
 
 
 
525e3b8
e3aa1b1
 
 
 
 
 
525e3b8
 
e3aa1b1
525e3b8
e3aa1b1
525e3b8
e3aa1b1
 
 
525e3b8
e3aa1b1
 
525e3b8
e3aa1b1
 
 
 
 
 
 
 
 
 
 
 
33f2655
525e3b8
e3aa1b1
525e3b8
e3aa1b1
525e3b8
e3aa1b1
525e3b8
e3aa1b1
525e3b8
e3aa1b1
525e3b8
e3aa1b1
525e3b8
e3aa1b1
525e3b8
e3aa1b1
 
525e3b8
e3aa1b1
 
 
 
525e3b8
e3aa1b1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
525e3b8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
language:
- it
license: apache-2.0
tags:
- italian
- sequence-to-sequence
- style-transfer
- efficient
- formality-style-transfer
datasets:
- yahoo/xformal_it
widget:
- text: "maronn qualcuno mi spieg' CHECCOSA SUCCEDE?!?!"
- text: "wellaaaaaaa, ma fraté sei proprio troppo simpatiko, grazieeee!!"
- text: "nn capisco xke tt i ragazzi lo fanno"
- text: "IT5 è SUPERMEGA BRAVISSIMO a capire tt il vernacolo italiano!!!"
metrics:
- rouge
- bertscore
model-index:
- name: it5-efficient-small-el32-informal-to-formal
  results:
  - task: 
      type: formality-style-transfer
      name: "Informal-to-formal Style Transfer"
    dataset:
      type: xformal_it
      name: "XFORMAL (Italian Subset)"
    metrics:
      - type: rouge1
        value: 0.430
        name: "Avg. Test Rouge1"
      - type: rouge2
        value: 0.221
        name: "Avg. Test Rouge2"
      - type: rougeL
        value: 0.408
        name: "Avg. Test RougeL"
      - type: bertscore
        value: 0.630
        name: "Avg. Test BERTScore"
---

# IT5 Cased Small Efficient EL32 for Informal-to-formal Style Transfer 🧐

*Shout-out to [Stefan Schweter](https://github.com/stefan-it) for contributing the pre-trained efficient model!*

This repository contains the checkpoint for the [IT5 Cased Small Efficient EL32](https://huggingface.co/it5/it5-efficient-small-el32) model fine-tuned on Informal-to-formal style transfer on the Italian subset of the XFORMAL dataset as part of the experiments of the paper [IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation](https://arxiv.org/abs/2203.03759) by [Gabriele Sarti](https://gsarti.com) and [Malvina Nissim](https://malvinanissim.github.io). 

Efficient IT5 models differ from the standard ones by adopting a different vocabulary that enables cased text generation and an [optimized model architecture](https://arxiv.org/abs/2109.10686) to improve performances while reducing parameter count. The Small-EL32 replaces the original encoder from the T5 Small architecture with a 32-layer deep encoder, showing improved performances over the base model.

A comprehensive overview of other released materials is provided in the [gsarti/it5](https://github.com/gsarti/it5) repository. Refer to the paper for additional details concerning the reported scores and the evaluation approach.

## Using the model

Model checkpoints are available for usage in Tensorflow, Pytorch and JAX. They can be used directly with pipelines as:

```python
from transformers import pipelines

i2f = pipeline("text2text-generation", model='it5/it5-efficient-small-el32-informal-to-formal')
i2f("nn capisco xke tt i ragazzi lo fanno")
>>> [{"generated_text": "non comprendo perché tutti i ragazzi agiscono così"}]
```

or loaded using autoclasses:

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("it5/it5-efficient-small-el32-informal-to-formal")
model = AutoModelForSeq2SeqLM.from_pretrained("it5/it5-efficient-small-el32-informal-to-formal")
```

If you use this model in your research, please cite our work as:

```bibtex
@article{sarti-nissim-2022-it5,
    title={{IT5}: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation},
    author={Sarti, Gabriele and Nissim, Malvina},
    journal={ArXiv preprint 2203.03759},
    url={https://arxiv.org/abs/2203.03759},
    year={2022},
	month={mar}
}
```

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10.0

### Framework versions

- Transformers 4.15.0
- Pytorch 1.10.0+cu102
- Datasets 1.17.0
- Tokenizers 0.10.3