--- language: fr license: mit tags: - en datasets: - bigscience/P3 --- ### Quantized BigScience's T0 3B with 8-bit weights This is a version of [BigScience's T0](https://huggingface.co/bigscience/T0_3B) with 3 billion parameters that is modified so you can generate **and fine-tune the model in colab or equivalent desktop gpu (e.g. single 1080Ti)**. Inspired by [GPT-J 8bit](https://huggingface.co/hivemind/gpt-j-6B-8bit). Here's how to run it: [![colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/1lMja-CPc0vm5_-gXNXAWU-9c0nom7vZ9) This model can be easily loaded using the `T5ForConditionalGeneration` functionality: ```python from transformers import T5ForConditionalGeneration model = T5ForConditionalGeneration.from_pretrained("gustavecortal/T0_3B-8bit") ``` Before loading, you have to Monkey-Patch T5: ```python class T5ForConditionalGeneration(transformers.models.t5.modeling_t5.T5ForConditionalGeneration): def __init__(self, config): super().__init__(config) convert_to_int8(self) transformers.models.t5.modeling_t5.T5ForConditionalGeneration = T5ForConditionalGeneration ``` ## Model Description T0* shows zero-shot task generalization on English natural language prompts, outperforming GPT-3 on many tasks, while being 16x smaller. It is a series of encoder-decoder models trained on a large set of different tasks specified in natural language prompts. We convert numerous English supervised datasets into prompts, each with multiple templates using varying formulations. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. To obtain T0*, we fine-tune a pretrained language model on this multitask mixture covering many different NLP tasks. ## Links * [BigScience](https://bigscience.huggingface.co/) * [Hivemind](https://training-transformers-together.github.io/) * [Gustave Cortal](https://twitter.com/gustavecortal) ```bibtex @misc{sanh2021multitask, title={Multitask Prompted Training Enables Zero-Shot Task Generalization}, author={Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Teven Le Scao and Arun Raja and Manan Dey and M Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal Nayak and Debajyoti Datta and Jonathan Chang and Mike Tian-Jian Jiang and Han Wang and Matteo Manica and Sheng Shen and Zheng Xin Yong and Harshit Pandey and Rachel Bawden and Thomas Wang and Trishala Neeraj and Jos Rozen and Abheesht Sharma and Andrea Santilli and Thibault Fevry and Jason Alan Fries and Ryan Teehan and Stella Biderman and Leo Gao and Tali Bers and Thomas Wolf and Alexander M. Rush}, year={2021}, eprint={2110.08207}, archivePrefix={arXiv}, primaryClass={cs.LG} } ```