stas commited on
Commit
89880f4
1 Parent(s): 58161ad
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -58,7 +58,7 @@ If you want to use another checkpoint, please replace the path in `AutoTokenizer
58
 
59
  # Training procedure
60
 
61
- T0* models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4). We use the publicly available [language model-adapated T5 checkpoints](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#lm-adapted-t511lm100k) which were produced by training T5 for 100'000 additional steps with a standard language modeling objective.
62
 
63
  At a high level, the input text is fed to the encoder and the target text is produced by the decoder. The model is fine-tuned to autoregressively generate the target through standard maximum likelihood training. It is never trained to generate the input. We detail our training data in the next section.
64
 
@@ -119,7 +119,7 @@ We also evaluate T0, T0p and T0pp on the a subset of the [BIG-bench benchmark](h
119
 
120
  # Limitations
121
 
122
- - The models of the T0* series are quite large (3B or 11B parameters). Loading them and performing inference requires non-trivial computational ressources. When using multiple GPUs, it is possible to use [.parallelize()](https://huggingface.co/transformers/parallelism.html).
123
  - We have observed that different prompts can lead to varying performances. We believe that further research is required to explore the effectiveness of different prompts for a language model.
124
  - Due to design choices in the tokenization, the models are unable to perform inference for tasks involving code or non English text.
125
 
58
 
59
  # Training procedure
60
 
61
+ T0* models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4). We use the publicly available [language model-adapted T5 checkpoints](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#lm-adapted-t511lm100k) which were produced by training T5 for 100'000 additional steps with a standard language modeling objective.
62
 
63
  At a high level, the input text is fed to the encoder and the target text is produced by the decoder. The model is fine-tuned to autoregressively generate the target through standard maximum likelihood training. It is never trained to generate the input. We detail our training data in the next section.
64
 
119
 
120
  # Limitations
121
 
122
+ - The models of the T0* series are quite large (3B or 11B parameters). Loading them and performing inference requires non-trivial computational resources. When using multiple GPUs, it is possible to use [.parallelize()](https://huggingface.co/transformers/parallelism.html).
123
  - We have observed that different prompts can lead to varying performances. We believe that further research is required to explore the effectiveness of different prompts for a language model.
124
  - Due to design choices in the tokenization, the models are unable to perform inference for tasks involving code or non English text.
125