WizardLM 30B model seems to rank way lower then it probably should be at.

by felixz - opened

I wonder if something has gone wrong with evaluation there?
Also, why is float16 bit evaluation missing?


It's marked 8-bit. They probably submitted it like that by accident, then the eval went haywire because it's a f16 delta.

Open LLM Leaderboard org

Yep, we don't manually correct request files, but it can lead to some models being improperly evaluated (for example a bunch of delta models had been submitted as "original" weights a month ago, and hence had a very bad performance.

clefourrier changed discussion status to closed

Sounds like validation shoul d not allow this. Will this model be fixed? i like to see this model properly evaludated.

felixz changed discussion status to open
Open LLM Leaderboard org

@felixz We can't know for all models how they should be evaluated, we assume that users will submit the correct version of their models. Any kind of at scale validation would be insanely time consuming and not feasible.
For this specific model, feel free to resubmit it with the proper setup!

clefourrier changed discussion status to closed

Sign up or log in to comment