bigscience
/

T0pp

@@ -5,7 +5,7 @@ license: apache-2.0
 # Model Description
-TO_* is a series of encoder-decoder models trained on a large set of different tasks specified in natural language prompts. We convert numerous English supervised datasets into prompts, each with multiple templates using varying formulations. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. To obtain T0_*, we fine-tune a pretrained language model on this multitask mixture covering many different NLP tasks.
 # Intended uses
@@ -40,7 +40,7 @@ If you want to use another checkpoint, please replace the path in `AutoTokenizer
 # Training procedure
-T0_* models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on 34B tokens from [C4](https://huggingface.co/datasets/c4). We use the publicly available [language model-adapated T5 checkpoints](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#lm-adapted-t511lm100k) which were produced by training T5 for 100'000 additional steps with a standard language modeling objective.
 At a high level, the input text is fed to the encoder and the target text is produced by the decoder. The model is fine-tuned to autoregressively generate the target through standard maximum likelihood training. It is never trained to generate the input. We detail our training data in the next section.
@@ -99,7 +99,7 @@ We also evaluate T0_11B, T0p_11B and T0pp_11B on the a subset of the [BIG-bench
 # Limitations
-- The models of the T0_* series are quite large (3B or 11B parameters). Loading them and performing inference requires non-trivial computational ressources. When using multiple GPUs, it is possible to use [.parallelize()](https://huggingface.co/transformers/parallelism.html).
 - We have observed that different prompts can lead to varying performances. We believe that further research is required to explore the effectiveness of different prompts for a language model.
 - Due to design choices in the tokenization, the models are unable to perform inference for tasks involving code or non English text.

 # Model Description
+T0* is a series of encoder-decoder models trained on a large set of different tasks specified in natural language prompts. We convert numerous English supervised datasets into prompts, each with multiple templates using varying formulations. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. To obtain T0*, we fine-tune a pretrained language model on this multitask mixture covering many different NLP tasks.
 # Intended uses
 # Training procedure
+T0* models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on 34B tokens from [C4](https://huggingface.co/datasets/c4). We use the publicly available [language model-adapated T5 checkpoints](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#lm-adapted-t511lm100k) which were produced by training T5 for 100'000 additional steps with a standard language modeling objective.
 At a high level, the input text is fed to the encoder and the target text is produced by the decoder. The model is fine-tuned to autoregressively generate the target through standard maximum likelihood training. It is never trained to generate the input. We detail our training data in the next section.
 # Limitations
+- The models of the T0* series are quite large (3B or 11B parameters). Loading them and performing inference requires non-trivial computational ressources. When using multiple GPUs, it is possible to use [.parallelize()](https://huggingface.co/transformers/parallelism.html).
 - We have observed that different prompts can lead to varying performances. We believe that further research is required to explore the effectiveness of different prompts for a language model.
 - Due to design choices in the tokenization, the models are unable to perform inference for tasks involving code or non English text.