gustavecortal commited on
Commit
f54af4b
1 Parent(s): d31b5ad

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: fr
3
+ license: mit
4
+ tags:
5
+ - en
6
+ datasets:
7
+ - bigscience/P3
8
+ ---
9
+
10
+ ### Quantized BigScience's T0 3B with 8-bit weights
11
+
12
+
13
+ This is a version of [BigScience's T0](https://huggingface.co/bigscience/T0_3B) with 3 billion parameters that is modified so you can generate **and fine-tune the model in colab or equivalent desktop gpu (e.g. single 1080Ti)**. Inspired by [GPT-J 8bit](https://huggingface.co/hivemind/gpt-j-6B-8bit).
14
+
15
+ Here's how to run it: [![colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/1ft6wQU0BhqG5PRlwgaZJv2VukKKjU4Es)
16
+
17
+ This model can be easily loaded using the `T5ForConditionalGeneration` functionality:
18
+ ```python
19
+ from transformers import T5ForConditionalGeneration
20
+ model = T5ForConditionalGeneration.from_pretrained("gustavecortal/T0_3B-8bit")
21
+ ```
22
+
23
+ Before loading, you have to Monkey-Patch T5:
24
+ ```python
25
+ class T5ForConditionalGeneration(transformers.models.t5.modeling_t5.T5ForConditionalGeneration):
26
+ def __init__(self, config):
27
+ super().__init__(config)
28
+ convert_to_int8(self)
29
+
30
+ transformers.models.t5.modeling_t5.T5ForConditionalGeneration = T5ForConditionalGeneration
31
+ ```
32
+
33
+ ## Model Description
34
+
35
+ T0* shows zero-shot task generalization on English natural language prompts, outperforming GPT-3 on many tasks, while being 16x smaller. It is a series of encoder-decoder models trained on a large set of different tasks specified in natural language prompts. We convert numerous English supervised datasets into prompts, each with multiple templates using varying formulations. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. To obtain T0*, we fine-tune a pretrained language model on this multitask mixture covering many different NLP tasks.
36
+
37
+ ## Links
38
+
39
+ * [Cedille](https://en.cedille.ai/)
40
+ * [Hivemind](https://training-transformers-together.github.io/)
41
+ * [Gustave Cortal](https://twitter.com/gustavecortal)