What does it mean for a model to be in the Running evaluation queue?

#480
by Mihaiii - opened

I have 2 models (34b) that are with "running" status for ~6 days now: https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Mihaiii/Pallas-0.4_eval_request_False_float16_Original.json

https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Mihaiii/Pallas-0.3_eval_request_False_float16_Original.json

I'm a little confused regarding what the "running evaluation queue" means:

  1. do all the models in the running queue run in parallel (why is it a queue and not a list?)?
  2. Is there any kind of sorting between models done when deciding the priority (the above json structure also has a "likes" key: why is it relevant how many likes a model has?)?
  3. The exact same models failed before at load, but they already were in the running queue so my conclusion was that the running didn't started since loading is the first step (what's the difference between the "running evaluation queue" and the "pending evaluation queue"?).
Hugging Face H4 org

Hi ! A model is put in the running queue when its evaluation script starts. That includes loading the model and the datasets, running the eval and uploading the results.
All the models are running in parallel, however, some job can be cancelled because a more important job took priority. Your model has been running for a few days because it's waiting for available compute. It will eventually complete, so please be patient :)

SaylorTwift changed discussion status to closed

Sign up or log in to comment