Same model appear multiple times on leaderboard with different scores

#180
by felixz - opened

Same model name , precision but different hash. Does it make sense so show the older and lesser performing models evaluations for the same named model and precision?
image.png

Yea I understand.. I saw the FAQ now.
Still from user point of view not ideal. maybe filter out the extras by default but let one see everything if they check a box.

Open LLM Leaderboard org

Hi @felixz ! Thank you for your issue!
I sadly can't know which/if a given hash is the correct one for a model :/ I see several options that are doable quite easily:

  • displaying the model with the best results
  • displaying the model with the latest hash

I'll think about the best way to do this in the next days.

@clefourrier
Something else to consider.
I noticed that if someone submits the same model multiple times under a different commit ID it will get evaluated multiple times as is designed. But if the only thing that hanged between the commits is the Readme file and no weight file changes the evaluation results are expectedly same even down the the two decimal points. Maybe it really does not make sense to show multiple rows in that case. If the commits had altered the weight files yes it makes sense to show both results but also the date/time of the commit column would be helpful as well in that case.

@clefourrier Any news on this? This also seems to be an issue for the "Open-Orca/OpenOrca-Platypus2-13B" model in the leaderboard.

Open LLM Leaderboard org

@Wubbbi At the moment, since it's not blocking, I'm leaving it as such, as people can look for the commit closest to the model version which interests them

Open LLM Leaderboard org

Hi!
We now should display only the latest submitted version of a model for each precision. (But we keep different precision levels separate, as it's been a highly requested feature).

clefourrier changed discussion status to closed
Open LLM Leaderboard org

Feel free to tell us if you observe problems!

Sign up or log in to comment