Feature request: Run 100B + models automatically

#434
by ChuckMcSneed - opened

Goliath-120B(https://huggingface.co/alpindale/goliath-120b) was submitted for evaluation almost a month ago. There are still no results. According to @clefourrier there is not enough memory to run it. Please fix it.

Hugging Face H4 org

Hi @ChuckMcSneed ,
As mentioned in the other discussion, our backend cannot at the moment manage to evaluate models this big (as they don't fit on one A100 node). We will add the feature in our roadmap.

Thank you for opening this issue! We'll keep track of it!

clefourrier changed discussion title from Goliath-120B evaluation to Feature request: Run 100B + models automatically

I noticed falcon-180b got removed from the leaderboard and these finetunes still on it:
OpenBuddy/openbuddy-falcon-180b-v13-preview0
OpenBuddy/openbuddy-falcon-180b-v12-preview0
Are they needing to be retested?

Hugging Face H4 org

Hi! Falcon-180B is still on the leaderboard if you select the little toggle to "Show gated/deleted/private models"

We would also be very happy to see this feature added, submitted DiscoResearch/DiscoLM-120b some time ago and didn't know what was the reason for it being stuck at "pending".

Maybe you could add some manual job to run >70b models on demand on a 4*A100 instance?

Thank you for you work on the leaderboard!

Can you eval bnb 4 bit quantizations of large models? It be beneficial to have some kind of indication. I was thinking about quip# 2bitting a goliath model, so I can fit it on my gpu locally, but that will take like 1.5 weeks to calculate. If it's not better I dont feel like doing it.
So if I use bnb to 4 bit goliath and submit it for eval will it work?

I've submitted TheBloke/DiscoLM-120b-GPTQ now. Hope that works.

@clefourrier How is the progress? Still trying to implement it or is it just too expensive?

@clefourrier Why was MegaDolphin-120b successfully tested while all the other 120b models have failed?

Hugging Face H4 org

Hi @ChuckMcSneed ,
Can you link the request file? I suspect it was submitted quantized - which could barely fit on our GPUs.

Hugging Face H4 org

Interesting!
It might be because we went from using A100 to using H100, and they don't seem to manage memory exactly the same, which could have allowed a slightly bigger model to fit (but just barely).
Other idea: @SaylorTwift did you launch MegaDolphin manually?

If not, we could try relaunching some of the bigger models (like goliath) and see what happens

Sign up or log in to comment