Why did top-ranked Guanaco 65B disappear from the leaderboard?

#108
by wolfram - opened

timdettmers/guanaco-65b-merged was one of the top ranked models on the leaderboard for as long time, but now it's missing! What happened?

timdettmers/guanaco-33b-merged is still there. The 65B was even better, so I wonder why that disappeared suddenly.

I see it's now back on the list, with a new (low!) average score of 32, which puts it below its 33B, 13B, and 7B versions. That can't be right!

I've used its quantized version TheBloke/guanaco-65B-GGML and it's the best model I've ever used locally, and I've evaluated all the major releases for months now. Either there's something very wrong with this particular version (timdettmers/guanaco-65b-merged) or with the leaderboard/testing procedures.

deleted

Things probably changed when they updated the MMLU evaluation on the leaderboard. Similar to how llama-65b received a big bump in its score.

Open LLM Leaderboard org

@wolfram Thanks for your feedback ! You are right, it's weird that this model is ranking so low, I am running it again to try and pinpoint a potential issue.

@SaylorTwift That's great to hear. Thanks for investigating and hope you can fix any issues you may find.

Open LLM Leaderboard org

@SaylorTwift likely linked to the llama-based tk fix we talked about

@SaylorTwift Looks like your rerun scored the same - timdettmers/guanaco-65b-merged is now listed twice, and both show almost the same low score - which can't possibly be right for such a popular model that used to be on top of the leaderboard.

I'm using TheBloke's quantized version so maybe it's a problem with the HF version you tested? It would be very helpful to know what's wrong here because right now other models could be affected as well, making the whole leadboard ranking unreliable.

Open LLM Leaderboard org

We exchanged with @timdettmers and they are not sure the merged version maintains performance.

@wolfram we suggest resubmitting Guanaco models you are interested in as adapter weights with the correct base model and precision.

@clefourrier Thanks for getting back with this information.

TheBloke/guanaco-65B-HF has been tested and top ten ranked, so that's OK for me.

I'll close this issue as it's now resolved.

wolfram changed discussion status to closed

Sign up or log in to comment