How is lm-evaluation-harness run for chat and instruct models?

by winddude - opened

How are you running lm-evaluation-harness for chat and instruct models?

Open LLM Leaderboard org

Hi! Exactly the same as for the other models, in order to get something as reproducible as possible :)

clefourrier changed discussion status to closed

Hi, what about the special tags (i.e: [INST]) that are used during finetuning? These are not added in the prompts in lm-evaluation-harness. If instructions do not follow the format used during finetuning it may not be a fair comparison.

Open LLM Leaderboard org

Using the system prompts for evaluations have been discussed, it's something we'll add during Q1, but which is not done at the moment.
It's a known limitation of the leaderboard.

Sign up or log in to comment