WizardLM-8x22B Evaluation failed

#823
by llama-anon - opened
Open LLM Leaderboard org
โ€ข
edited Jul 5

Hi @llama-anon ,

Thanks for providing the request file!
Currently, our cluster is quite full, so we're only evaluating models that can run on a single node. This approach helps us evaluate more models concurrently. However, if there's enough interest from the community, we're open to manually evaluating models that require more than one node, like this one you've submitted

Another interested user here. Wizard 8x22 is my current go to open source model I use for pretty much everything

Definitely interested in seeing how Wizardlm-2 8x22b stacks up. It seems vastly better than the other fine-tunes of 8x22b, including Mistral's own. I think the only reason it hasn't gotten more attention is that it was never put on LMSYS Arena. It's been in the top ten most used models on OpenRouter for awhile now and I think it would have a solid chance of topping the leaderboard.

Would be good to get this added, it's been out quite some time but people really rave about it.

Very capable finetune by Microsoft, would love to it added (and potentially the 7b variant)!

Strongest FOSS model/finetune, except for coding. Crazy this isn't on the leaderboard. Yes it needs a lot of VRAM. but it would be really good to showcase the best of open source IMO.

WizardLM-2-8x22b is one of the most powerful open-source language models. It would be really great to see how it performs compared to other open-source large language models on the Open-LLM-Leaderboard.

Voting WizardLM-2-8x22b

Would love to see Wizard ranked! Itโ€™d be really good to see how it compares to other wizard and non-wizard models.

Would also love to see it ranked. Was a fantastic model when I tried online hostings of it. Still worthwhile on the lobotomized local usage my setup can get out of it, which was a pleasant surprise.

I definitely have an interest in seeing the Wizard benchmarks. This topic has come up a few times on LocalLlama, but none of us have really known how to get it up here and just assumed it wouldn't happen.

I think you'd make a few people pretty happy if you were able to squeeze this one in.

Wiz 8x22 5bpw is still my daily driver. It's writing contextual awareness and fringe knowledge is still unmatched IMO. Would love to see how it stacks up against the other top dogs.

Voting WizardLM-2-8x22b

Add my vote!

Voting WizardLM-2-8x22b

Vote from me as well !

I would like to see it too.

Very interested to see it Benchmarked also +1

Another vote from me

Id rather we run the highest quality models get the baseline going then proceed to quantity as the goal is to top score as soon as possible so we stop the plateau

get wizardlm in

Open LLM Leaderboard org

Hi everyone,

Thanks for your messages and activity! Let's start WizardLM-2-8x22B evaluation! ๐Ÿš€

Hi everyone,

Thanks for your messages and activity! Let's start WizardLM-2-8x22B evaluation! ๐Ÿš€

great news! I'm really curious how it stacks up. I'm also glad the feedback was heard.

Open LLM Leaderboard org
โ€ข
edited Jul 15

Thanks everyone for your activity and patience! WizardLM-2-8x22B is now 8th on the Leaderboard with an average score of 32.61!

Screenshot 2024-07-15 at 14.12.33.png

alozowski changed discussion status to closed

Great - thank you for evaluation!

Sign up or log in to comment