bigscience/bloomz · Difference between MT0 and BLOOMZ

Yes that's correct.

There are output/performance differences between the models - You may want to take a look at the paper Crosslingual Generalization through Multitask Finetuning, where the graphs/tables compare performance of mT0 & BLOOMZ models. Generally,

mT0 models are stronger on most benchmarks if we compare models with ~the same parameters.
mT0 models are biased towards shorter answers, you can check the examples in the appendix of the paper for that if you want 😇
Performance by language will vary. E.g. mT0 models are much stronger on many languages BLOOMZ has not extensively seen, such as Russian or Japanese.

In addition, various specs are different like precision & various architecture components.

Let us know if you notice any other differences!