gsarti commited on
Commit
e685231
1 Parent(s): ec54656

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - it
4
+ datasets:
5
+ - gsarti/clean_mc4_it
6
+ tags:
7
+ - seq2seq
8
+ - lm-head
9
+ license: apache-2.0
10
+ inference: false
11
+ ---
12
+
13
+ # Italian T5 Small 🇮🇹
14
+
15
+ The [IT5](https://huggingface.co/models?search=it5) model family represents the first effort in pretraining large-scale sequence-to-sequence transformer models for the Italian language, following the approach adopted by the original [T5 model](https://github.com/google-research/text-to-text-transfer-transformer).
16
+
17
+ This model is released as part of the project ["IT5: Large-Scale Text-to-Text Pretraining for Italian Language Understanding and Generation"](https://gsarti.com) (to be released), by [Gabriele Sarti](https://gsarti.com/) with the support of [Huggingface](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104) and with TPU usage sponsored by Google's [TPU Research Cloud](https://sites.research.google/trc/). All the training was conducted on a single TPU3v8-VM machine on Google Cloud. Refer to the Tensorboard tab of the repository for an overview of the training process.
18
+
19
+ *The inference widget is deactivated because the model needs a task-specific seq2seq fine-tuning on a downstream task to be useful in practice. The model [`gsarti/it5-base-nli`](https://huggingface.co/gsarti/it5-base-nli) provides an example of this model fine-tuned on a downstream NLI task.*
20
+
21
+ ## Model variants
22
+
23
+ This repository contains the checkpoints for the `base` version of the model. The model was trained for one epoch (1.05M steps) on the [Thoroughly Cleaned Italian mC4 Corpus](https://huggingface.co/datasets/gsarti/clean_mc4_it) (~41B words, ~275GB) using 🤗 Datasets and the `google/t5-v1_1-small` improved configuration.The training procedure is made available [on Github](https://github.com/gsarti/t5-flax-gcp).
24
+
25
+ The following table summarizes the parameters for all available models
26
+
27
+ | |`it5-small` (this one) |`it5-base` |`it5-large` |`it5-base-oscar` |
28
+ |-----------------------|-----------------------|----------------------|-----------------------|----------------------------------|
29
+ |`dataset` |`gsarti/clean_mc4_it` |`gsarti/clean_mc4_it` |`gsarti/clean_mc4_it` |`oscar/unshuffled_deduplicated_it`|
30
+ |`architecture` |`google/t5-v1_1-small` |`google/t5-v1_1-base` |`google/t5-v1_1-large` |`t5-base` |
31
+ |`learning rate` | 5e-3 | 5e-3 | 5e-3 | 1e-2 |
32
+ |`steps` | 1'050'000 | 1'050'000 | 2'100'000 | 258'000 |
33
+ |`training time` | 36 hours | 101 hours | 370 hours | 98 hours |
34
+ |`ff projection` |`gated-gelu` |`gated-gelu` |`gated-gelu` |`relu` |
35
+ |`tie embeds` |`false` |`false` |`false` |`true` |
36
+ |`optimizer` | adafactor | adafactor | adafactor | adafactor |
37
+ |`max seq. length` | 512 | 512 | 512 | 512 |
38
+ |`per-device batch size`| 16 | 16 | 8 | 16 |
39
+ |`tot. batch size` | 128 | 128 | 64 | 128 |
40
+ |`weigth decay` | 1e-3 | 1e-3 | 1e-2 | 1e-3 |
41
+ |`validation split size`| 15K examples | 15K examples | 15K examples | 15K examples |
42
+
43
+ The high training time of `it5-base-oscar` was due to [a bug](https://github.com/huggingface/transformers/pull/13012) in the training script.
44
+
45
+ For a list of individual model parameters, refer to the `config.json` file in the respective repositories.
46
+
47
+ ## Using the models
48
+
49
+ ```python
50
+ from transformers import T5Tokenizer, T5ForConditionalGeneration
51
+
52
+ tokenizer = T5Tokenizer.from_pretrained("gsarti/it5-small")
53
+ model = T5ForConditionalGeneration.from_pretrained("gsarti/it5-small")
54
+ ```
55
+
56
+ *Note: You will need to fine-tune the model on your downstream seq2seq task to use it. See an example [here](https://huggingface.co/gsarti/it5-base-nli).*
57
+
58
+ Flax and Tensorflow versions of the model are also available:
59
+
60
+ ```python
61
+ from transformers import FlaxT5ForConditionalGeneration, TFT5ForConditionalGeneration
62
+
63
+ model_flax = FlaxT5ForConditionalGeneration.from_pretrained("gsarti/it5-small")
64
+ model_tf = TFT5ForConditionalGeneration.from_pretrained("gsarti/it5-small")
65
+ ```
66
+
67
+ ## Limitations
68
+
69
+ Due to the nature of the web-scraped corpus on which IT5 models were trained, it is likely that their usage could reproduce and amplify pre-existing biases in the data, resulting in potentially harmful content such as racial or gender stereotypes and conspiracist views. For this reason, the study of such biases is explicitly encouraged, and model usage should ideally be restricted to research-oriented and non-user-facing endeavors.
70
+
71
+ ## Model curators
72
+
73
+ For problems or updates on this model, please contact [gabriele.sarti996@gmail.com](mailto:gabriele.sarti996@gmail.com).
74
+
75
+ ## Citation Information
76
+
77
+ *Coming soon!*