No good way to identify number of activated parameters causes MIxtral evaluation failures

#680
by 0-hero - opened

Hey @clefourrier I noticed all the 8x22B finetunes failed
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1 @lewtun ?
migtissera/Tess-2.0-Mixtral-8x22B @migtissera
0-hero/Matter-0.2-8x22B mine
and maybe a few more I missed

Hugging Face H4 org

Hi all!
As you can see from the job ids (-1), the jobs were not launched - this is because our backend assumes that the models have 140B activated parameters (which is too big for the cluster, hence skipped), not 140B total parameters with considerably less activated. I'm unsure there is an easy way for us to make the difference automatically at the moment, but we'll gladly update our backend and re-submit your models once we can get this information.

clefourrier changed discussion title from 8x22B's failing to No good way to identify number of activated parameters causes MIxtral evaluation failures

Hey, is this fixed now or still waiting?

Hugging Face H4 org

Hi everyone!

Thanks to @SaylorTwift , now we can submit moe models bigger than 140B for evaluation, thus I resubmitted this one for @MaziyarPanahi

Please, provide me with the requests files for similar models to resubmit

Fantastic! Thanks @alozowski and @SaylorTwift

Hugging Face H4 org

Resubmitted both migtissera/Tess-2.0-Mixtral-8x22B and 0-hero/Matter-0.2-8x22B 👍

In that case I close this discussion, if there are any problems with models evaluations please open new ones for each model

alozowski changed discussion status to closed

Says FAILED

I think all 3 failed again

Yes, I created a separate discussion for my models. 2 of the failed models were 8B, so something else might have happened.

alozowski changed discussion status to open
Hugging Face H4 org

Hi everyone!

Hmm, I see, all these models have indeed failed, let me investigate

Hey, any update here on the Tess model? Do you want me to open a separate ticket to track it? This is the model: https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/migtissera/Tess-2.0-Mixtral-8x22B_eval_request_False_float16_Original.json

There seems to be something going on with the LB eval cluster, at least for some large models. Even my Llama-3-70B submission has been running for the last 2 days. https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.1_eval_request_False_bfloat16_Original.json

Hugging Face H4 org

Hi!

We're still looking up ways to launch moe models correctly on our backend - and we've also had network failures on our cluster last week. We'll keep you posted as soon as we have updates.

@MaziyarPanahi , what you are reporting is normal and unrelated to the current issue :) When the research cluster is full, the evaluation jobs are cancelled and rescheduled, but we keep the status to "running" to keep it simple for end users. It's likely your model was "running, cancelled, rescheduled, running, ..."

Hi @clefourrier

Thanks for the update regarding MoE models, appreciate it.

but we keep the status to "running" to keep it simple for end users. It's likely your model was "running, cancelled, rescheduled, running, ..."

I didn't know that, it makes sense now. Thank you :)

Hugging Face H4 org
edited 4 days ago

Doing it right now, tell me if it works.

To this day, the only 8x22B models in the Leaderboard are from MistralAI. I don't believe we have ever had a successful eval on any 8x22B fine-tuned models. @clefourrier is the issue resolved and the only limitation is to find free resources? Or we still don't know if the MoE models with this size might get rejected?

Hugging Face H4 org

These ones we launched manually when they came out because they were important for the community.
Good question, I think @SaylorTwift took a look at the backend side so I'll let him answer.
(The main problem we had was (as indicated in the title) identifying the number of activated params in MoEs.)

Sign up or log in to comment