Unusually High Performance
#12
by
fathom
- opened
Is it just me or is this 560m parameter version of bloom qualitatively out-performing the 1,3, and 7 billion versions when it comes to instructional prompts? Has anyone else noticed this?
@fathom Interesting; could you give a few examples where you noticed that?
yes interested as well
Makes sense, it’s possible the smaller model tried learning the actual skills involved with producing text because it didn’t have enough memory (parameters in this case) to memorize it. Still curious, keep us updated.
christopher
changed discussion status to
closed