Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
How is lm-evaluation-harness run for chat and instruct models?
#49
by
winddude
- opened
How are you running lm-evaluation-harness for chat and instruct models?
Hi! Exactly the same as for the other models, in order to get something as reproducible as possible :)
clefourrier
changed discussion status to
closed
Hi, what about the special tags (i.e: [INST]) that are used during finetuning? These are not added in the prompts in lm-evaluation-harness
. If instructions do not follow the format used during finetuning it may not be a fair comparison.
Hi!
Using the system prompts for evaluations have been discussed, it's something we'll add during Q1, but which is not done at the moment.
It's a known limitation of the leaderboard.