Spaces:
Running
on
CPU Upgrade
Model benchmarks degraded after re-evaluation
Hey, after re-evaluating the model with use_chat_template set on the model the performance degrades alot.
this model
Etherll/Qwen2.5-7B-della-test
Can we undo this?
Hi @Etherll ,
I see that the model Etherll/Qwen2.5-7B-della-test
includes a chat_template
, so it’s expected to evaluate it with use_chat_template = True
for proper alignment with its intended use. However, I understand your concern regarding the performance drop. We plan to update the request file naming conventions and introduce the ability to distinguish between chat-template-based and non-chat-template-based evaluations for the same model. Currently, the system only considers the most recent evaluation, regardless of whether it used a chat template, which can lead to scenarios like this.
Thanks for bringing this up – we’ll keep you updated on the progress!
I'm closing this issue now, feel free to ping me here in case of any questions on this topic or please open a new discussion!