bigscience/bloomz · Zero shot comparison with Instruct-GPT-3 ?

Nov 9, 2022

Really impressed with this model and the open source effort here. I can see that the paper has some zero-shot metrics on held out tasks, I wanted to know if there is a comparison to OpenAI's InstructGPT-3 available anywhere? It would be a game changer for the AI community if this model performs comparably to it.

Muennighoff

BigScience Workshop org Nov 9, 2022

Really impressed with this model and the open source effort here. I can see that the paper has some zero-shot metrics on held out tasks, I wanted to know if there is a comparison to OpenAI's InstructGPT-3 available anywhere? It would be a game changer for the AI community if this model performs comparably to it.

The only dataset I found that both InstructGPT & BLOOMZ are evaluated on is RTE. It looks like BLOOMZ is better zero-shot on RTE (BLOOMZ: >80, while InstructGPT is ~70, see screenshots). Albeit just one datapoint may not be super meaningful.

Table 14 InstructGPT:

Table 7 BLOOMZ & fam; RTE highlighted in blue:

nishanthcmesh

Nov 10, 2022

Looks like that's the only data point there is then. Thanks for going through the tables and finding that out!

cakiki changed discussion status to closed Nov 10, 2022