Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1036

WizardLM 30B model seems to rank way lower then it probably should be at.

#266

by felixz - opened Sep 7, 2023

Discussion

felixz

Sep 7, 2023

I wonder if something has gone wrong with evaluation there?
Also, why is float16 bit evaluation missing?

xzuyn

Sep 7, 2023

It's marked 8-bit. They probably submitted it like that by accident, then the eval went haywire because it's a f16 delta.

clefourrier

Open LLM Leaderboard org Sep 8, 2023

Yep, we don't manually correct request files, but it can lead to some models being improperly evaluated (for example a bunch of delta models had been submitted as "original" weights a month ago, and hence had a very bad performance.

clefourrier changed discussion status to closed Sep 8, 2023

felixz

Sep 8, 2023

Sounds like validation shoul d not allow this. Will this model be fixed? i like to see this model properly evaludated.

felixz changed discussion status to open Sep 8, 2023

clefourrier

Open LLM Leaderboard org Sep 8, 2023

@felixz We can't know for all models how they should be evaluated, we assume that users will submit the correct version of their models. Any kind of at scale validation would be insanely time consuming and not feasible.
For this specific model, feel free to resubmit it with the proper setup!

clefourrier changed discussion status to closed Sep 8, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment