Spaces:

HuggingFaceH4
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

727

Model evaluation failed

#494

by adamo1139 - opened Dec 21, 2023

Discussion

adamo1139

Dec 21, 2023

Hello :)

Evaluation of Yi-34B-200K-DARE-merge-v5 by @brucethemoose failed.

https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/brucethemoose/Yi-34B-200K-DARE-merge-v5_eval_request_False_bfloat16_Original.json

Brucethemoose loaded this model with transformers using 4bit, while reducing max_position_embeddings (vram limitations), and it worked fine. Based on other open similar recently opened discussions, it seems like you have issues with the cluster doing the evaluations. If this is what caused this issue, can you please add it to the queue once again after compute cluster is operational?

brucethemoose

Dec 21, 2023

•

edited Dec 21, 2023

Yeah, thanks for posting this. I saw Tess and one of my old merges fail this way as well.

As adamo suggested, I think the leaderboard needs a check for context size? Basically if its enormous, clamp it to something reasonable like 32K to avoid CUDA OOMs on the test bench.

adamo1139

Dec 21, 2023

I haven't seen it fail for any 200k model but I don't follow it closely for most of them. My best guess is that your model failed evaluation due to cluster-wide connectivity or processing issue.

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/489

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/492

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/493

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/485

Quoting Clémentine in #485

Side note - our eval cluster changed and we are in full debugging mode (connectivity issues) so it might take a couple days for us to come back to you.

SaylorTwift

Hugging Face H4 org Dec 27, 2023

Hi ! The connectivity issues on the cluster have been fixed, and your model should be on the leaderboard :)
Don't hesitate to re-open the issue if your model failed.

SaylorTwift changed discussion status to closed Dec 27, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment