How is lm-evaluation-harness run for chat and instruct models?

#49
by winddude - opened

How are you running lm-evaluation-harness for chat and instruct models?

Open LLM Leaderboard org

Hi! Exactly the same as for the other models, in order to get something as reproducible as possible :)

clefourrier changed discussion status to closed

Hi, what about the special tags (i.e: [INST]) that are used during finetuning? These are not added in the prompts in lm-evaluation-harness. If instructions do not follow the format used during finetuning it may not be a fair comparison.

Open LLM Leaderboard org

Hi!
Using the system prompts for evaluations have been discussed, it's something we'll add during Q1, but which is not done at the moment.
It's a known limitation of the leaderboard.

Sign up or log in to comment