datasets:
- bigscience/xP3
license: bigscience-bloom-rail-1.0
language:
- ak
- ar
- as
- bm
- bn
- ca
- code
- en
- es
- eu
- fon
- fr
- gu
- hi
- id
- ig
- ki
- kn
- lg
- ln
- ml
- mr
- ne
- nso
- ny
- or
- pa
- pt
- rn
- rw
- sn
- st
- sw
- ta
- te
- tn
- ts
- tum
- tw
- ur
- vi
- wo
- xh
- yo
- zh
- zu
programming_language:
- C
- C++
- C#
- Go
- Java
- JavaScript
- Lua
- PHP
- Python
- Ruby
- Rust
- Scala
- TypeScript
pipeline_tag: text-generation
widget:
- text: >-
一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。Would you rate the
previous review as positive, neutral or negative?
example_title: zh-en sentiment
- text: 一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。你认为这句话的立场是赞扬、中立还是批评?
example_title: zh-zh sentiment
- text: Suggest at least five related search terms to "Mạng neural nhân tạo".
example_title: vi-en query
- text: >-
Proposez au moins cinq mots clés concernant «Réseau de neurones
artificiels».
example_title: fr-fr query
- text: >-
Explain in a sentence in Telugu what is backpropagation in neural
networks.
example_title: te-en qa
- text: Why is the sky blue?
example_title: en-en qa
- text: >-
Write a fairy tale about a troll saving a princess from a dangerous
dragon. The fairy tale is a masterpiece that has achieved praise worldwide
and its moral is "Heroes Come in All Shapes and Sizes". Story (in
Spanish):
example_title: es-en fable
- text: >-
Write a fable about wood elves living in a forest that is suddenly invaded
by ogres. The fable is a masterpiece that has achieved praise worldwide
and its moral is "Violence is the last refuge of the incompetent". Fable
(in Hindi):
example_title: hi-en fable
Table of Contents
- Model Summary
- Use
- Bias, Risks, and Limitations
- Training Details
- Evaluation
- Environmental Impact
- Citation
- How To Get Started With the Model
Model Summary
We present BLOOMZ & mT0, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOM & mT5 pretrained multilingual language models on our crosslingual task mixture (xP3) and find our resulting models capable of crosslingual generalization to unseen tasks & languages.
- Repository: bigscience-workshop/xmtf
- Paper: [TODO]
- Point of Contact: Niklas Muennighoff
- BLOOMZ & mT0 Model Family:
Name Explanation bloomz-560m 560M parameter multitask finetuned version of bloom-560m on xP3 bloomz-1b1 1.1B parameter multitask finetuned version of bloom-1b1 on xP3 bloomz-1b7 1.7B parameter multitask finetuned version of bloom-1b7 on xP3 bloomz-3b 3B parameter multitask finetuned version of bloom-3b on xP3 bloomz-7b1 7.1B parameter multitask finetuned version of bloom-7b1 on xP3 bloomz 176B parameter multitask finetuned version of bloom on xP3 bloomz-7b1-mt 7.1B parameter multitask finetuned version of bloom-7b1 on xP3 & xP3mt. Better than bloomz-7b1 when prompting in non-English bloomz-mt 176B parameter multitask finetuned version of bloom on xP3 & xP3mt. Better than bloomz when prompting in non-English bloomz-7b1-p3 7.1B parameter multitask finetuned version of bloom-7b1 on P3. Released for research purposes, performance is inferior to bloomz-7b1 bloomz-p3 176B parameter multitask finetuned version of bloom on P3. Released for research purposes, performance is inferior to bloomz mt0-small 300M parameter multitask finetuned version of mt5-small on xP3 mt0-base 580M parameter multitask finetuned version of mt5-base on xP3 mt0-large 1.2B parameter multitask finetuned version of mt5-large on xP3 mt0-xl 3.7B parameter multitask finetuned version of mt5-xl on xP3 mt0-xxl 13B parameter multitask finetuned version of mt5-xxl on xP3 mt0-xxl-mt 13B parameter multitask finetuned version of mt5-xxl on xP3 & xP3mt. Better than mt0-xxl when prompting in non-English mt0-xxl-p3 13B parameter multitask finetuned version of mt5-xxl on P3. Released for research purposes, performance is inferior to mt0-xxl ---- -----------
Intended uses
You can use the models to perform inference on tasks by specifying your query in natural language, and the models will generate a prediction. For instance, you can ask "Translate this to Chinese: Je t'aime.", and the model will hopefully generate "我爱你".
How to use
Here is how to use the model in PyTorch:
TODO: Better code with auto-precision?
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-560m")
model = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-560m")
inputs = tokenizer.encode("Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy", return_tensors="pt")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
To use another checkpoint, replace the path in AutoTokenizer
and AutoModelForCausalLM
.
Note: 176B models are trained with bfloat16, while smaller models are trained with fp16. We recommend using the same precision type or fp32 at inference
Limitations
- Large model size may require large computational resources
- High performance variance depending on the prompt
BibTeX entry and citation info
TODO