metadata

datasets:
  - bigscience/xP3
license: bigscience-bloom-rail-1.0
language:
  - ak
  - ar
  - as
  - bm
  - bn
  - ca
  - code
  - en
  - es
  - eu
  - fon
  - fr
  - gu
  - hi
  - id
  - ig
  - ki
  - kn
  - lg
  - ln
  - ml
  - mr
  - ne
  - nso
  - ny
  - or
  - pa
  - pt
  - rn
  - rw
  - sn
  - st
  - sw
  - ta
  - te
  - tn
  - ts
  - tum
  - tw
  - ur
  - vi
  - wo
  - xh
  - yo
  - zh
  - zu
programming_language:
  - C
  - C++
  - C#
  - Go
  - Java
  - JavaScript
  - Lua
  - PHP
  - Python
  - Ruby
  - Rust
  - Scala
  - TypeScript
pipeline_tag: text-generation
widget:
  - text: >-
      一个传奇的开端，一个不灭的神话，这不仅仅是一部电影，而是作为一个走进新时代的标签，永远彪炳史册。Would you rate the
      previous review as positive, neutral or negative?
    example_title: zh-en sentiment
  - text: 一个传奇的开端，一个不灭的神话，这不仅仅是一部电影，而是作为一个走进新时代的标签，永远彪炳史册。你认为这句话的立场是赞扬、中立还是批评？
    example_title: zh-zh sentiment
  - text: Suggest at least five related search terms to "Mạng neural nhân tạo".
    example_title: vi-en query
  - text: >-
      Proposez au moins cinq mots clés concernant «Réseau de neurones
      artificiels».
    example_title: fr-fr query
  - text: >-
      Explain in a sentence in Telugu what is backpropagation in neural
      networks.
    example_title: te-en qa
  - text: Why is the sky blue?
    example_title: en-en qa
  - text: >-
      Write a fairy tale about a troll saving a princess from a dangerous
      dragon. The fairy tale is a masterpiece that has achieved praise worldwide
      and its moral is "Heroes Come in All Shapes and Sizes". Story (in
      Spanish):
    example_title: es-en fable
  - text: >-
      Write a fable about wood elves living in a forest that is suddenly invaded
      by ogres. The fable is a masterpiece that has achieved praise worldwide
      and its moral is "Violence is the last refuge of the incompetent". Fable
      (in Hindi):
    example_title: hi-en fable

Model Summary
Use
Bias, Risks, and Limitations
Training Details
Evaluation
Environmental Impact
Citation
How To Get Started With the Model

Model Summary

We present BLOOMZ & mT0, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOM & mT5 pretrained multilingual language models on our crosslingual task mixture (xP3) and find our resulting models capable of crosslingual generalization to unseen tasks & languages.

Repository: bigscience-workshop/xmtf
Paper: [TODO]
Point of Contact: Niklas Muennighoff

BLOOMZ & mT0 Model Family:

Name	Explanation
bloomz-560m	560M parameter multitask finetuned version of bloom-560m on xP3
bloomz-1b1	1.1B parameter multitask finetuned version of bloom-1b1 on xP3
bloomz-1b7	1.7B parameter multitask finetuned version of bloom-1b7 on xP3
bloomz-3b	3B parameter multitask finetuned version of bloom-3b on xP3
bloomz-7b1	7.1B parameter multitask finetuned version of bloom-7b1 on xP3
bloomz	176B parameter multitask finetuned version of bloom on xP3

bloomz-7b1-mt	7.1B parameter multitask finetuned version of bloom-7b1 on xP3 & xP3mt. Better than bloomz-7b1 when prompting in non-English
bloomz-mt	176B parameter multitask finetuned version of bloom on xP3 & xP3mt. Better than bloomz when prompting in non-English

bloomz-7b1-p3	7.1B parameter multitask finetuned version of bloom-7b1 on P3. Released for research purposes, performance is inferior to bloomz-7b1
bloomz-p3	176B parameter multitask finetuned version of bloom on P3. Released for research purposes, performance is inferior to bloomz


mt0-small	300M parameter multitask finetuned version of mt5-small on xP3
mt0-base	580M parameter multitask finetuned version of mt5-base on xP3
mt0-large	1.2B parameter multitask finetuned version of mt5-large on xP3
mt0-xl	3.7B parameter multitask finetuned version of mt5-xl on xP3
mt0-xxl	13B parameter multitask finetuned version of mt5-xxl on xP3

mt0-xxl-mt	13B parameter multitask finetuned version of mt5-xxl on xP3 & xP3mt. Better than mt0-xxl when prompting in non-English

mt0-xxl-p3	13B parameter multitask finetuned version of mt5-xxl on P3. Released for research purposes, performance is inferior to mt0-xxl
----	-----------

Intended uses

You can use the models to perform inference on tasks by specifying your query in natural language, and the models will generate a prediction. For instance, you can ask "Translate this to Chinese: Je t'aime.", and the model will hopefully generate "我爱你".

How to use

Here is how to use the model in PyTorch:

TODO: Better code with auto-precision?

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-560m")
model = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-560m")

inputs = tokenizer.encode("Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy", return_tensors="pt")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

To use another checkpoint, replace the path in AutoTokenizer and AutoModelForCausalLM.

Note: 176B models are trained with bfloat16, while smaller models are trained with fp16. We recommend using the same precision type or fp32 at inference

Limitations

Large model size may require large computational resources
High performance variance depending on the prompt

BibTeX entry and citation info

TODO

bigscience
/

bloomz

Table of Contents

Model Summary

Intended uses

How to use

Limitations

BibTeX entry and citation info