--- datasets: - Muennighoff/P3 - mc4 license: apache-2.0 language: - af - am - ar - az - be - bg - bn - ca - ceb - co - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fil - fr - fy - ga - gd - gl - gu - ha - haw - hi - hmn - ht - hu - hy - ig - is - it - iw - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lb - lo - lt - lv - mg - mi - mk - ml - mn - mr - ms - mt - my - ne - nl - 'no' - ny - pa - pl - ps - pt - ro - ru - sd - si - sk - sl - sm - sn - so - sq - sr - st - su - sv - sw - ta - te - tg - th - tr - uk - und - ur - uz - vi - xh - yi - yo - zh - zu pipeline_tag: text2text-generation widget: - text: >- 一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。Would you rate the previous review as positive, neutral or negative? example_title: zh-en sentiment - text: 一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。你认为这句话的立场是赞扬、中立还是批评? example_title: zh-zh sentiment - text: Suggest at least five related search terms to "Mạng neural nhân tạo". example_title: vi-en query - text: >- Proposez au moins cinq mots clés concernant «Réseau de neurones artificiels». example_title: fr-fr query - text: Explain in a sentence in Telugu what is backpropagation in neural networks. example_title: te-en qa - text: Why is the sky blue? example_title: en-en qa - text: >- Write a fairy tale about a troll saving a princess from a dangerous dragon. The fairy tale is a masterpiece that has achieved praise worldwide and its moral is "Heroes Come in All Shapes and Sizes". Story (in Spanish): example_title: es-en fable - text: >- Write a fable about wood elves living in a forest that is suddenly invaded by ogres. The fable is a masterpiece that has achieved praise worldwide and its moral is "Violence is the last refuge of the incompetent". Fable (in Hindi): example_title: hi-en fable model-index: - name: mt0-xxl-p3 results: - task: type: Coreference resolution dataset: type: winogrande name: Winogrande XL (xl) config: xl split: validation revision: a80f460359d1e9a67c006011c94de42a8759430c metrics: - type: Accuracy value: 58.56 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (en) config: en split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 70.71 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (fr) config: fr split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 61.45 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (jp) config: jp split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 66.11 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (pt) config: pt split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 64.26 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (ru) config: ru split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 62.22 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (zh) config: zh split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 65.67 - task: type: Natural language inference dataset: type: anli name: ANLI (r1) config: r1 split: validation revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094 metrics: - type: Accuracy value: 47.3 - task: type: Natural language inference dataset: type: anli name: ANLI (r2) config: r2 split: validation revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094 metrics: - type: Accuracy value: 40.2 - task: type: Natural language inference dataset: type: anli name: ANLI (r3) config: r3 split: validation revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094 metrics: - type: Accuracy value: 44.17 - task: type: Natural language inference dataset: type: super_glue name: SuperGLUE (cb) config: cb split: validation revision: 9e12063561e7e6c79099feb6d5a493142584e9e2 metrics: - type: Accuracy value: 76.79 - task: type: Natural language inference dataset: type: super_glue name: SuperGLUE (rte) config: rte split: validation revision: 9e12063561e7e6c79099feb6d5a493142584e9e2 metrics: - type: Accuracy value: 83.03 - task: type: Natural language inference dataset: type: xnli name: XNLI (ar) config: ar split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 56.06 - task: type: Natural language inference dataset: type: xnli name: XNLI (bg) config: bg split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 58.23 - task: type: Natural language inference dataset: type: xnli name: XNLI (de) config: de split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 58.59 - task: type: Natural language inference dataset: type: xnli name: XNLI (el) config: el split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 57.35 - task: type: Natural language inference dataset: type: xnli name: XNLI (en) config: en split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 61 - task: type: Natural language inference dataset: type: xnli name: XNLI (es) config: es split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 59.4 - task: type: Natural language inference dataset: type: xnli name: XNLI (fr) config: fr split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 58.47 - task: type: Natural language inference dataset: type: xnli name: XNLI (hi) config: hi split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 55.46 - task: type: Natural language inference dataset: type: xnli name: XNLI (ru) config: ru split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 56.55 - task: type: Natural language inference dataset: type: xnli name: XNLI (sw) config: sw split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 53.73 - task: type: Natural language inference dataset: type: xnli name: XNLI (th) config: th split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 56.02 - task: type: Natural language inference dataset: type: xnli name: XNLI (tr) config: tr split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 55.38 - task: type: Natural language inference dataset: type: xnli name: XNLI (ur) config: ur split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 54.02 - task: type: Natural language inference dataset: type: xnli name: XNLI (vi) config: vi split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 57.11 - task: type: Natural language inference dataset: type: xnli name: XNLI (zh) config: zh split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 55.22 - task: type: Sentence completion dataset: type: story_cloze name: StoryCloze (2016) config: '2016' split: validation revision: e724c6f8cdf7c7a2fb229d862226e15b023ee4db metrics: - type: Accuracy value: 96.63 - task: type: Sentence completion dataset: type: super_glue name: SuperGLUE (copa) config: copa split: validation revision: 9e12063561e7e6c79099feb6d5a493142584e9e2 metrics: - type: Accuracy value: 91 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (et) config: et split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 79 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (ht) config: ht split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 80 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (id) config: id split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 87 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (it) config: it split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 90 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (qu) config: qu split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 56 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (sw) config: sw split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 75 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (ta) config: ta split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 84 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (th) config: th split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 77 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (tr) config: tr split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 76 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (vi) config: vi split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 84 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (zh) config: zh split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 79 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (ar) config: ar split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 89.15 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (es) config: es split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 93.98 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (eu) config: eu split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 91.66 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (hi) config: hi split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 89.41 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (id) config: id split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 92.65 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (my) config: my split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 88.35 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (ru) config: ru split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 84.91 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (sw) config: sw split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 89.61 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (te) config: te split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 87.29 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (zh) config: zh split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 93.58 --- ![xmtf](https://github.com/bigscience-workshop/xmtf/blob/master/xmtf_banner.png?raw=true) # Table of Contents 1. [Model Summary](#model-summary) 2. [Use](#use) 3. [Limitations](#limitations) 4. [Training](#training) 5. [Evaluation](#evaluation) 7. [Citation](#citation) # Model Summary > We present BLOOMZ & mT0, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOM & mT5 pretrained multilingual language models on our crosslingual task mixture (xP3) and find our resulting models capable of crosslingual generalization to unseen tasks & languages. - **Repository:** [bigscience-workshop/xmtf](https://github.com/bigscience-workshop/xmtf) - **Paper:** [Crosslingual Generalization through Multitask Finetuning](https://arxiv.org/abs/2211.01786) - **Point of Contact:** [Niklas Muennighoff](mailto:niklas@hf.co) - **Languages:** Refer to [mc4](https://huggingface.co/datasets/mc4) for pretraining & [xP3](https://huggingface.co/bigscience/xP3) for finetuning language proportions. It understands both pretraining & finetuning languages. - **BLOOMZ & mT0 Model Family:**
Multitask finetuned on xP3. Recommended for prompting in English. | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Parameters | 300M | 580M | 1.2B | 3.7B | 13B | 560M | 1.1B | 1.7B | 3B | 7.1B | 176B |
Finetuned Model | mt0-small | mt0-base | mt0-large | mt0-xl | mt0-xxl | bloomz-560m | bloomz-1b1 | bloomz-1b7 | bloomz-3b | bloomz-7b1 | bloomz |
Multitask finetuned on xP3mt. Recommended for prompting in non-English. | |||||||||||
Finetuned Model | mt0-xxl-mt | bloomz-7b1-mt | bloomz-mt | Multitask finetuned on P3. Released for research purposes only. Strictly inferior to above models! | |||||||
Finetuned Model | mt0-xxl-p3 | bloomz-7b1-p3 | bloomz-p3 | Original pretrained checkpoints. Not recommended. | |||||||
Pretrained Model | mt5-small | mt5-base | mt5-large | mt5-xl | mt5-xxl | bloom-560m | bloom-1b1 | bloom-1b7 | bloom-3b | bloom-7b1 | bloom |