Leaderboard is of very limited use without more 0-shot, instruction prompted datasets

#27
by JulesGM - opened

Most of the use of LLM nowadays is with zero shot & prompting, yet there is just one fairly specific dataset evaluating this.

I think it would be important to add more zero-shotted, instruction prompted datasets as this is how the models will be used a large fraction of the time.

Hi! We tried to select a good range of evaluation tasks based on what is used in the litterature to compare models :)
We might add more 0-shot evaluations in the future!

clefourrier changed discussion status to closed

Sign up or log in to comment