gustavecortal
/

T0_3B-8bit

Text2Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

gustavecortal commited on Jan 15, 2022

Commit

f54af4b

•

1 Parent(s): d31b5ad

Create README.md

Files changed (1) hide show

README.md +41 -0

README.md ADDED Viewed

	@@ -0,0 +1,41 @@

+---
+language: fr
+license: mit
+tags:
+- en
+datasets:
+- bigscience/P3
+---
+### Quantized BigScience's T0 3B with 8-bit weights
+This is a version of [BigScience's T0](https://huggingface.co/bigscience/T0_3B) with 3 billion parameters that is modified so you can generate **and fine-tune the model in colab or equivalent desktop gpu (e.g. single 1080Ti)**. Inspired by [GPT-J 8bit](https://huggingface.co/hivemind/gpt-j-6B-8bit).
+Here's how to run it: [![colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/1ft6wQU0BhqG5PRlwgaZJv2VukKKjU4Es)
+This model can be easily loaded using the `T5ForConditionalGeneration` functionality:
+```python
+from transformers import T5ForConditionalGeneration
+model = T5ForConditionalGeneration.from_pretrained("gustavecortal/T0_3B-8bit")
+```
+Before loading, you have to Monkey-Patch T5:
+```python
+class T5ForConditionalGeneration(transformers.models.t5.modeling_t5.T5ForConditionalGeneration):
+    def __init__(self, config):
+        super().__init__(config)
+        convert_to_int8(self)
+transformers.models.t5.modeling_t5.T5ForConditionalGeneration = T5ForConditionalGeneration
+```
+## Model Description
+T0* shows zero-shot task generalization on English natural language prompts, outperforming GPT-3 on many tasks, while being 16x smaller. It is a series of encoder-decoder models trained on a large set of different tasks specified in natural language prompts. We convert numerous English supervised datasets into prompts, each with multiple templates using varying formulations. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. To obtain T0*, we fine-tune a pretrained language model on this multitask mixture covering many different NLP tasks.
+## Links
+* [Cedille](https://en.cedille.ai/)
+* [Hivemind](https://training-transformers-together.github.io/)
+* [Gustave Cortal](https://twitter.com/gustavecortal)