--- datasets: - bigscience/xP3mt license: bigscience-bloom-rail-1.0 language: - ak - ar - as - bm - bn - ca - code - en - es - eu - fon - fr - gu - hi - id - ig - ki - kn - lg - ln - ml - mr - ne - nso - ny - or - pa - pt - rn - rw - sn - st - sw - ta - te - tn - ts - tum - tw - ur - vi - wo - xh - yo - zh - zu programming_language: - C - C++ - C# - Go - Java - JavaScript - Lua - PHP - Python - Ruby - Rust - Scala - TypeScript pipeline_tag: text-generation widget: - text: "一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。Would you rate the previous review as positive, neutral or negative?" example_title: "zh-en sentiment" - text: "一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。你认为这句话的立场是赞扬、中立还是批评?" example_title: "zh-zh sentiment" - text: "Suggest at least five related search terms to \"Mạng neural nhân tạo\"." example_title: "vi-en query" - text: "Proposez au moins cinq mots clés concernant «Réseau de neurones artificiels»." example_title: "fr-fr query" - text: "Explain in a sentence in Telugu what is backpropagation in neural networks." example_title: "te-en qa" - text: "Why is the sky blue?" example_title: "en-en qa" - text: "Write a fairy tale about a troll saving a princess from a dangerous dragon. The fairy tale is a masterpiece that has achieved praise worldwide and its moral is \"Heroes Come in All Shapes and Sizes\". Story (in Spanish):" example_title: "es-en fable" - text: "Write a fable about wood elves living in a forest that is suddenly invaded by ogres. The fable is a masterpiece that has achieved praise worldwide and its moral is \"Violence is the last refuge of the incompetent\". Fable (in Hindi):" example_title: "hi-en fable" model-index: - name: bloomz-mt results: - task: type: Coreference resolution dataset: type: winogrande name: Winogrande XL (xl) config: xl split: validation revision: a80f460359d1e9a67c006011c94de42a8759430c metrics: - type: Accuracy value: 59.98 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (en) config: en split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 69.33 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (fr) config: fr split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 68.67 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (jp) config: jp split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 58.39 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (pt) config: pt split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 64.64 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (ru) config: ru split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 60.32 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (zh) config: zh split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 76.39 - task: type: Natural language inference dataset: type: anli name: ANLI (r1) config: r1 split: validation revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094 metrics: - type: Accuracy value: 49.7 - task: type: Natural language inference dataset: type: anli name: ANLI (r2) config: r2 split: validation revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094 metrics: - type: Accuracy value: 45.0 - task: type: Natural language inference dataset: type: anli name: ANLI (r3) config: r3 split: validation revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094 metrics: - type: Accuracy value: 45.58 - task: type: Natural language inference dataset: type: super_glue name: SuperGLUE (cb) config: cb split: validation revision: 9e12063561e7e6c79099feb6d5a493142584e9e2 metrics: - type: Accuracy value: 87.5 - task: type: Natural language inference dataset: type: super_glue name: SuperGLUE (rte) config: rte split: validation revision: 9e12063561e7e6c79099feb6d5a493142584e9e2 metrics: - type: Accuracy value: 85.92 - task: type: Natural language inference dataset: type: xnli name: XNLI (ar) config: ar split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 58.03 - task: type: Natural language inference dataset: type: xnli name: XNLI (bg) config: bg split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 46.75 - task: type: Natural language inference dataset: type: xnli name: XNLI (de) config: de split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 53.69 - task: type: Natural language inference dataset: type: xnli name: XNLI (el) config: el split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 46.55 - task: type: Natural language inference dataset: type: xnli name: XNLI (en) config: en split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 61.81 - task: type: Natural language inference dataset: type: xnli name: XNLI (es) config: es split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 59.12 - task: type: Natural language inference dataset: type: xnli name: XNLI (fr) config: fr split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 59.12 - task: type: Natural language inference dataset: type: xnli name: XNLI (hi) config: hi split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 52.53 - task: type: Natural language inference dataset: type: xnli name: XNLI (ru) config: ru split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 52.85 - task: type: Natural language inference dataset: type: xnli name: XNLI (sw) config: sw split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 50.36 - task: type: Natural language inference dataset: type: xnli name: XNLI (th) config: th split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 43.98 - task: type: Natural language inference dataset: type: xnli name: XNLI (tr) config: tr split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 43.78 - task: type: Natural language inference dataset: type: xnli name: XNLI (ur) config: ur split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 51.24 - task: type: Natural language inference dataset: type: xnli name: XNLI (vi) config: vi split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 55.82 - task: type: Natural language inference dataset: type: xnli name: XNLI (zh) config: zh split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 55.5 - task: type: Program synthesis dataset: type: openai_humaneval name: HumanEval config: None split: test revision: e8dc562f5de170c54b5481011dd9f4fa04845771 metrics: - type: Pass@1 value: 13.55 - type: Pass@10 value: 26.26 - type: Pass@100 value: 47.01 - task: type: Sentence completion dataset: type: story_cloze name: StoryCloze (2016) config: "2016" split: validation revision: e724c6f8cdf7c7a2fb229d862226e15b023ee4db metrics: - type: Accuracy value: 96.69 - task: type: Sentence completion dataset: type: super_glue name: SuperGLUE (copa) config: copa split: validation revision: 9e12063561e7e6c79099feb6d5a493142584e9e2 metrics: - type: Accuracy value: 91.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (et) config: et split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 54.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (ht) config: ht split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 55.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (id) config: id split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 87.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (it) config: it split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 74.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (qu) config: qu split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 56.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (sw) config: sw split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 67.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (ta) config: ta split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 70.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (th) config: th split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 53.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (tr) config: tr split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 54.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (vi) config: vi split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 91.0 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (zh) config: zh split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 86.0 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (ar) config: ar split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 94.04 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (es) config: es split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 95.5 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (eu) config: eu split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 86.83 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (hi) config: hi split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 89.15 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (id) config: id split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 92.79 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (my) config: my split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 52.35 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (ru) config: ru split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 79.09 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (sw) config: sw split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 81.14 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (te) config: te split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 82.4 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (zh) config: zh split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 92.85 --- ![xmtf](https://github.com/bigscience-workshop/xmtf/blob/master/xmtf_banner.png?raw=true) # Table of Contents 1. [Model Summary](#model-summary) 2. [Use](#use) 3. [Limitations](#limitations) 4. [Training](#training) 5. [Evaluation](#evaluation) 7. [Citation](#citation) # Model Summary > We present BLOOMZ & mT0, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOM & mT5 pretrained multilingual language models on our crosslingual task mixture (xP3) and find the resulting models capable of crosslingual generalization to unseen tasks & languages. - **Repository:** [bigscience-workshop/xmtf](https://github.com/bigscience-workshop/xmtf) - **Paper:** [Crosslingual Generalization through Multitask Finetuning](https://arxiv.org/abs/2211.01786) - **Point of Contact:** [Niklas Muennighoff](mailto:niklas@hf.co) - **Languages:** Refer to [bloom](https://huggingface.co/bigscience/bloom) for pretraining & [xP3](https://huggingface.co/datasets/bigscience/xP3) for finetuning language proportions. It understands both pretraining & finetuning languages. - **BLOOMZ & mT0 Model Family:**
Multitask finetuned on xP3. Recommended for prompting in English. | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Parameters | 300M | 580M | 1.2B | 3.7B | 13B | 560M | 1.1B | 1.7B | 3B | 7.1B | 176B |
Finetuned Model | mt0-small | mt0-base | mt0-large | mt0-xl | mt0-xxl | bloomz-560m | bloomz-1b1 | bloomz-1b7 | bloomz-3b | bloomz-7b1 | bloomz |
Multitask finetuned on xP3mt. Recommended for prompting in non-English. | |||||||||||
Finetuned Model | mt0-xxl-mt | bloomz-7b1-mt | bloomz-mt | Multitask finetuned on P3. Released for research purposes only. Strictly inferior to above models! | |||||||
Finetuned Model | mt0-xxl-p3 | bloomz-7b1-p3 | bloomz-p3 | Original pretrained checkpoints. Not recommended. | |||||||
Pretrained Model | mt5-small | mt5-base | mt5-large | mt5-xl | mt5-xxl | bloom-560m | bloom-1b1 | bloom-1b7 | bloom-3b | bloom-7b1 | bloom |