gustavecortal
commited on
Commit
•
f54af4b
1
Parent(s):
d31b5ad
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: fr
|
3 |
+
license: mit
|
4 |
+
tags:
|
5 |
+
- en
|
6 |
+
datasets:
|
7 |
+
- bigscience/P3
|
8 |
+
---
|
9 |
+
|
10 |
+
### Quantized BigScience's T0 3B with 8-bit weights
|
11 |
+
|
12 |
+
|
13 |
+
This is a version of [BigScience's T0](https://huggingface.co/bigscience/T0_3B) with 3 billion parameters that is modified so you can generate **and fine-tune the model in colab or equivalent desktop gpu (e.g. single 1080Ti)**. Inspired by [GPT-J 8bit](https://huggingface.co/hivemind/gpt-j-6B-8bit).
|
14 |
+
|
15 |
+
Here's how to run it: [![colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/1ft6wQU0BhqG5PRlwgaZJv2VukKKjU4Es)
|
16 |
+
|
17 |
+
This model can be easily loaded using the `T5ForConditionalGeneration` functionality:
|
18 |
+
```python
|
19 |
+
from transformers import T5ForConditionalGeneration
|
20 |
+
model = T5ForConditionalGeneration.from_pretrained("gustavecortal/T0_3B-8bit")
|
21 |
+
```
|
22 |
+
|
23 |
+
Before loading, you have to Monkey-Patch T5:
|
24 |
+
```python
|
25 |
+
class T5ForConditionalGeneration(transformers.models.t5.modeling_t5.T5ForConditionalGeneration):
|
26 |
+
def __init__(self, config):
|
27 |
+
super().__init__(config)
|
28 |
+
convert_to_int8(self)
|
29 |
+
|
30 |
+
transformers.models.t5.modeling_t5.T5ForConditionalGeneration = T5ForConditionalGeneration
|
31 |
+
```
|
32 |
+
|
33 |
+
## Model Description
|
34 |
+
|
35 |
+
T0* shows zero-shot task generalization on English natural language prompts, outperforming GPT-3 on many tasks, while being 16x smaller. It is a series of encoder-decoder models trained on a large set of different tasks specified in natural language prompts. We convert numerous English supervised datasets into prompts, each with multiple templates using varying formulations. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. To obtain T0*, we fine-tune a pretrained language model on this multitask mixture covering many different NLP tasks.
|
36 |
+
|
37 |
+
## Links
|
38 |
+
|
39 |
+
* [Cedille](https://en.cedille.ai/)
|
40 |
+
* [Hivemind](https://training-transformers-together.github.io/)
|
41 |
+
* [Gustave Cortal](https://twitter.com/gustavecortal)
|