Spaces:

HuggingFaceH4
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

709

Interesting stats

#25

by BBLL3456 - opened May 23, 2023

Discussion

BBLL3456

May 23, 2023

It is interesting to see that currently a 30b model beat Llama-65b; and that Alpaca 13b is worse off compared to Llama 13b.

Tibbnak

May 23, 2023

•

edited May 23, 2023

Keep in context how close the numbers are, though. The difference of a few points is still relatively on par. =

It's also funny to see 13b models doing so well compared to 30 and 65. (also oof, galactica with the 120b), but, there might be other testing benchmarks that could tell a more full story.

Sure wish this leaderboard could handle 3 and 4 bit quantization though. Seems like a glaring oversight considering the march of current tech.

Edit: also no ram usage statistics, no tokens/second, etc. Honestly I'm now not sure what this leaderboard is even meant to be comparing besides 'stuff that llama is good at'

ozzeruk82

May 24, 2023

Those are some great points, this has become so popular that surely an update must be on the cards. Those stats you describe would be a great start.

BBLL3456

May 24, 2023

Sure wish this leaderboard could handle 3 and 4 bit quantization though. Seems like a glaring oversight considering the march of current tech.

I just tried submitting some 4 bit quantz models, the models have been accepted for evaluation...so huggingface do listen to our grunts haha

Tibbnak

May 24, 2023

Sure wish this leaderboard could handle 3 and 4 bit quantization though. Seems like a glaring oversight considering the march of current tech.

I just tried submitting some 4 bit quantz models, the models have been accepted for evaluation...so huggingface do listen to our grunts haha

It'll accept it. That's why the leaderboard was stuck in the first place though day two. once it reached a 4bit model the que went full derp.
No idea if they fixed it.

xChiptune

May 27, 2023

Would be interesting to see SuperHot and Bluemoon since they're one of the few models featuring higher context size. It would give us an idea of what's the impact of larger context size to coherency, especially SuperHot because SuperCot is already there to compare with. I feel like the stuff in the in queue is more of the same.

BBLL3456

May 29, 2023

Would be interesting to see SuperHot and Bluemoon since they're one of the few models featuring higher context size. It would give us an idea of what's the impact of larger context size to coherency, especially SuperHot because SuperCot is already there to compare with. I feel like the stuff in the in queue is more of the same.

These have been submitted, looking forward to see their performance. I am also curious about Guanaco, so far, the results I have been getting on Guanaco 30B is the best (tho I haven't tried Falcon yet).

concedo

Jun 4, 2023

Take this leaderboard with a grain of salt. Somehow my 19M OPT chatsalad finetuned on a single 20mb corpus has beat half a dozen other models.

Fredithefish

Jun 17, 2023

Take this leaderboard with a grain of salt. Somehow my 19M OPT chatsalad finetuned on a single 20mb corpus has beat half a dozen other models.

Is it published on HF?
Could you send a link?

Fredithefish

Jun 17, 2023

It is interesting to see that currently a 30b model beat Llama-65b; and that Alpaca 13b is worse off compared to Llama 13b.

Falcon isn't near as good as llama 65B on my opinion.

concedo

Jun 17, 2023

It's in my profile

clefourrier

Hugging Face H4 org Jul 13, 2023

Hi! All the scores have been updated with the correct MMLU results, following the discussions here!

clefourrier changed discussion status to closed Jul 21, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment