Unusually High Performance

#12
by fathom - opened

Is it just me or is this 560m parameter version of bloom qualitatively out-performing the 1,3, and 7 billion versions when it comes to instructional prompts? Has anyone else noticed this?

BigScience Workshop org

@fathom Interesting; could you give a few examples where you noticed that?

BigScience Workshop org

yes interested as well

Makes sense, it’s possible the smaller model tried learning the actual skills involved with producing text because it didn’t have enough memory (parameters in this case) to memorize it. Still curious, keep us updated.

Sign up or log in to comment