Zero shot comparison with Instruct-GPT-3 ?
Really impressed with this model and the open source effort here. I can see that the paper has some zero-shot metrics on held out tasks, I wanted to know if there is a comparison to OpenAI's InstructGPT-3 available anywhere? It would be a game changer for the AI community if this model performs comparably to it.
Really impressed with this model and the open source effort here. I can see that the paper has some zero-shot metrics on held out tasks, I wanted to know if there is a comparison to OpenAI's InstructGPT-3 available anywhere? It would be a game changer for the AI community if this model performs comparably to it.
The only dataset I found that both InstructGPT & BLOOMZ are evaluated on is RTE. It looks like BLOOMZ is better zero-shot on RTE (BLOOMZ: >80, while InstructGPT is ~70, see screenshots). Albeit just one datapoint may not be super meaningful.
Table 14 InstructGPT:
Table 7 BLOOMZ & fam; RTE highlighted in blue:
Looks like that's the only data point there is then. Thanks for going through the tables and finding that out!