Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1052

GPTQ and Mixtral models will need to be relaunched

#692

by CombinHorizon - opened Apr 20

Discussion

CombinHorizon

Apr 20

i don't know why or what happened, but all those failed

see
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/cognitivecomputations/dolphin-2.5-mixtral-8x7b_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/cognitivecomputations/dolphin-2.6-mixtral-8x7b_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/cognitivecomputations/dolphin-2.7-mixtral-8x7b_eval_request_False_bfloat16_Original.json

https://old.reddit.com/r/LocalLLaMA/comments/18s61fb/pressuretested_the_most_popular_opensource_llms/
they also tested dolphin-2.6-mixtral there, so i don't what is causing it to not work here, a re-run of dolphin-2.7-mixtral-8x7b still failed

deleted

Apr 20

@CombinHorizon I want to see the Dolphin Mixtrals evaluated to, but apparently they don't use safetensors, hence can't be evaluated.

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/517

CombinHorizon

Apr 20

(these have also failed)
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/TheBloke/dolphin-2.5-mixtral-8x7b-GPTQ_eval_request_False_GPTQ_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/TheBloke/dolphin-2.6-mixtral-8x7b-GPTQ_eval_request_False_GPTQ_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/TheBloke/dolphin-2.7-mixtral-8x7b-GPTQ_eval_request_False_GPTQ_Original.json

deleted

Apr 20

@CombinHorizon Thanks for submitting them for evaluation. I just rechecked and GGUFs and GPTQs are weights only quantizations (WOQ) so they shouldn't have failed for security reasons.

0-hero

Apr 21

@CombinHorizon I did those long context pressure tests, referring to the screenshot from my Reddit post. But that isn’t related to this leaderboard it was done with a different eval code

alozowski

Open LLM Leaderboard org Apr 22

Hi everyone!

Thank you @Phil337 for the link on Dolphin Mixtrals evaluation discussion! I guess it's the same problem here. Besides, I should say we're currently solving a tech problem to be able to evaluate GPTQ versions. I'll reschedule these GPTQ versions for evaluation once we'll fix the problem, hopefully by the end of the week

clefourrier changed discussion title from All dolphin 8x7b models failed, what did the logs say, next steps, can they be added to GPTQ models will need to be relaunched Apr 23

clefourrier

Open LLM Leaderboard org Apr 23

Hi! We have 2 issues here:

the Mixtral evaluations were sigtermed by our cluster, most likely a TP/DP problem, we need to change something in our backend but it's going to take some time.
the GPTQ evaluations failing however are a problem of mismatch between some of our requirements - we'll relaunch those as soon as it's updated.

clefourrier changed discussion title from GPTQ models will need to be relaunched to GPTQ and Mixtral models will need to be relaunched Apr 23

clefourrier changed discussion status to closed Jun 25

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment