openlifescienceai/open_medical_llm_leaderboard · Model evaluation and submission stuck of LB.

Jun 6

Hi, The evaluation queue of Leaderboard has been stuck for a few days. Can you guys check it out and get it back up? Thank you.

CombinHorizon

Jun 19

•

edited Jul 5

it has been stuck since 2024-05may-31
(~35 days as of 2024-07jul-05)

previously it would run pretty quickly, not frozen progress (same numbers of finished/pending models for days)

CombinHorizon

Jul 5

Question: how much longer or how much more resources (vRAM or compute) does it take for float32 precision (vs float16 or bfloat16) to run, given a certain model size?

CombinHorizon

Jul 11

•

edited Jul 11

could it be , that too many float32 models are running at the same time, that is frozen like this?,
are there any logs, about the current progress for the running models, e.g. what task/sub-test is is on, is the progress moving forward, and any indications of ETA?

CombinHorizon

Jul 18

by stuck, we mean, the leaderboard is stuck at a count of only 231 finished models, with no more new ones being added to the results
(see logs for timeline)

CombinHorizon

Jul 24

•

edited Jul 24

see
https://huggingface.co/datasets/openlifescienceai/requests/discussions/7
https://huggingface.co/datasets/openlifescienceai/requests/discussions/6
https://huggingface.co/datasets/openlifescienceai/requests/discussions/5
https://huggingface.co/datasets/openlifescienceai/requests/discussions/4
https://huggingface.co/datasets/openlifescienceai/requests/discussions/3
https://huggingface.co/datasets/openlifescienceai/requests/discussions/2

maybe delete these running float32 models, it'll probably unclog the leaderboard? ...
https://huggingface.co/datasets/openlifescienceai/requests/blob/main/cognitivecomputations/dolphin-2.9.1-yi-1.5-9b_eval_request_False_float32_Original.json
https://huggingface.co/datasets/openlifescienceai/requests/blob/main/wenbopan/Faro-Yi-9B-DPO_eval_request_False_float32_Original.json
https://huggingface.co/datasets/openlifescienceai/requests/blob/main/wenbopan/Faro-Yi-9B_eval_request_False_float32_Original.json
https://huggingface.co/datasets/openlifescienceai/requests/blob/main/01-ai/Yi-1.5-9B-Chat-16K_eval_request_False_float32_Original.json
https://huggingface.co/datasets/openlifescienceai/requests/blob/main/01-ai/Yi-1.5-9B-Chat_eval_request_False_float32_Original.json
https://huggingface.co/datasets/openlifescienceai/requests/blob/main/vicgalle/Configurable-Yi-1.5-9B-Chat_eval_request_False_float32_Original.json

CombinHorizon

Jul 24

•

edited Aug 23

@aaditya @aryopg , any updates?

i get how float32 is cool, if it were feasible, but is the difference on the huggingface leaderboard, the difference between float16 and bfloat16 enough ? - often only a few tenths of a percentage points, something to keep in mind. Could the number of concurrent float32 running models be limited/de-prioritized, without restarting the progress (rerun all over again), to prevent clogging?, could there be info/logging about how much progress-status / about the current (sub-question/sub-test task it's on) to let us know the ETA and how well it's moving, if at all, to help gauge if it's worth progressing? Over this period of time, aren't newer and better models coming out?, maybe?, what is a good way to weigh this?

https://huggingface.co/datasets/openlifescienceai/requests/blob/main/01-ai/Yi-1.5-9B-Chat-16K_eval_request_False_float32_Original.json
https://huggingface.co/datasets/openlifescienceai/requests/blob/main/01-ai/Yi-1.5-9B-Chat_eval_request_False_float32_Original.json

Are more closer to the core-model, it would make somewhat sense to prioritize these two, maybe more than the others... , perhaps?

robinsmits

Aug 18

@clefourrier Is HuggingFace aware that the submission of models for evaluation on this leaderboard seems to be stuck?
It is still the same queue status as when I looked last Friday.

Reading the discussions here it is already longer an issue.

Would it be possible to take a look at these issues?

aaditya

Open Life Science AI org Aug 21

Hi @robinsmits , sorry for the inconvenience. We’re currently upgrading the GPUs in the backend, along with making several other improvements. The delays in the queue are mostly due to GPU allocation and processing speed. We appreciate your patience as we work through these issues.

robinsmits

Aug 22

@aaditya Ok thanks for clarifying :-)

tosaddler

Aug 22

If it's a matter of computational resources, I can work with you to get the evaluations run. I have access to both Biowulf and another internal HPC with GPU nodes.

CombinHorizon

Sep 21

Thanks for the attention to this, but has there been any progress on this lately?

PranavHarshan

Oct 1

still the leaderboard is cloged it seems

robinsmits

Oct 2

@aaditya When can we expect this Leaderboard to be operational again?

Some models that were already reported on July 24th are still in the queue.

I can't possible imagine that it would take more than 2 months to add a few GPU's?? If I'am wrong than wouldn't it be an idea to ask HuggingFace for assistance?

CombinHorizon

11 days ago

additionally, this model SrikanthChellappa/Collaiborator-MEDLLM-Llama-3-8B-v2-7 in the running list has been deleted/removed, can it be removed from the running list?

request file:
https://huggingface.co/datasets/openlifescienceai/requests/blob/main/SrikanthChellappa/Collaiborator-MEDLLM-Llama-3-8B-v2-7_eval_request_False_bfloat16_Original.json

@aaditya @aryopg @clefourrier

also, leaderboard has been stuck for months, is float32 worth it?
or maybe should some triage/management be added?, so that there is a limit to the number of float32 models that may run at the same time?
i've seen float32 runs (for mid-(or larger)-size models on another leaderboard) - freeze the processing of non-float32 models also, if they run at the same time,