Spaces:
Running
on
CPU Upgrade
Failed reason
Hi!
I submitted 2 models yesterday and they both failed, but I can't see the error message.
Is there something I can do?
@SaylorTwift Could you please rerun the pipeline? I can't resubmit them. :(
Hi! There was a connection error when loading your models, I added them back to pending
These 3 models failed:
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Mihaiii/Pallas-0.3_eval_request_False_float16_Original.json
Could you please rerun?
Thanks!
Hi! Thanks for the complete issue!
Checked the logs, and they have all been cancelled because of pre-emption, added them back to pending :)
@clefourrier Thanks for adding them back to pending, but all of them failed again.
Is it something I did wrong? I don't think I did anything different compared to models that successfully ran.
Hi
@Mihaiii
,
I'm very sorry about the inconvenience! We're changing our backend from one cluster to another and had a bunch of env failures, I passed them back to pending again - I hope after tomorrow we will have fixed everything.
cc
@SaylorTwift
Sorry to disturb, but they failed again:
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Mihaiii/Pallas-0.3_eval_request_False_float16_Original.json
And Metis-0.4 appears with FINISHED status, but it's not in the leaderboard (and I checked to show flagged/deleted):
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Mihaiii/Metis-0.4_eval_request_False_bfloat16_Original.json
Hi,
FYI, the new cluster is having strong connectivity problems, we are putting all evals on hold til it's fixed, and we'll relaunch all FAILED evals of the past 2 days
I submitted them with bfloat16 precision instead of float16 and they were evaluated. This works for me so I'm closing this thread.