open-llm-leaderboard/open_llm_leaderboard · Dolphin models disappearing / not eval

Nov 15, 2023

dolphin-2.2-70b and dolphin-2_2-yi-34b not working

Open LLM Leaderboard org Nov 15, 2023

Hi!
Could you please follow the steps in the FAQ? There is absolutely nothing I can do if you just tell me "model not working", as I can't just magically guess what the problem you are encountering is.

Are you having trouble submitting your model? In which case, are you getting an error message? What are the full parameters you used?
Was your model properly submitted? If yes, can you link the request file so we can get more info on the current status?

Dampfinchen

Nov 15, 2023

•

edited Nov 15, 2023

Hi!
Could you please follow the steps in the FAQ? There is absolutely nothing I can do if you just tell me "model not working", as I can't just magically guess what the problem you are encountering is.

Are you having trouble submitting your model? In which case, are you getting an error message? What are the full parameters you used?

Was your model properly submitted? If yes, can you link the request file so we can get more info on the current status?

Hello,

he's referring to the failed evaluations of these models: https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/ehartford/dolphin-2_2-yi-34b_eval_request_False_4bit_Original.json
and
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/ehartford/dolphin-2.2-70b_eval_request_False_bfloat16_Original.json

Thank you!

clefourrier

Open LLM Leaderboard org Nov 16, 2023

Hi @Dampfinchen ,
Thank you very much for the details!

Long story short, we have a new system to run our evaluations on the HF cluster, where the leaderboard evaluations get cancelled automatically if a higher priority job needs resources (since the leaderboard is only running on the spare cycles of the cluster). The jobs get relaunched automatically in the end, but they get displayed as failed in the meantime. Your jobs will be relaunched automatically when we have enough compute available :)

We're working on improving the display on this as it is very confusing for everyone (us included ^^').

Closing for now, but feel free to reopen if your models are still not evaluated in a week.

clefourrier changed discussion status to closed Nov 16, 2023

ehartford

Nov 17, 2023

So do I need to resubmit ehartford/dolphin-2.2-70b and ehartford/dolphin-2_2-yi-34b?

ehartford

Nov 20, 2023

I'm confused why my models aren't getting evaluated. Did I do something wrong?

ehartford changed discussion status to open Nov 20, 2023

clefourrier

Open LLM Leaderboard org Nov 20, 2023

Hi!
Thank you for your attention about this!
ehartford/dolphin-2.2-70b failed because of a node failure, I'm adding it back to the queue.
However, your other model dolphin-2_2-yi-34b failed because it couldn't load the YiTokenizer - does it require trust_remote_code=True?

ehartford

Nov 20, 2023

Yes it does require trust_remote_code=True

clefourrier

Open LLM Leaderboard org Nov 20, 2023

Ha! As we don't allow ŧrust_remote_code=True for general models submissions on the leaderboard for safety reasons, as indicated in the FAQ (as it would require us to check manually the code of all new models), we won't run this model. I'll investigate how you were able to submit it - the leaderboard should prevent the submission of these models.

ehartford

Nov 20, 2023

You find that an appropriate resolution?

ehartford

Nov 20, 2023

In other words - no finetunes of Yi will be evaluated

clefourrier

Open LLM Leaderboard org Nov 20, 2023

•

edited Nov 20, 2023

In other words - no finetunes of Yi will be evaluated

Indeed. We don't support trust_remote_code=True for safety reasons, and fine-tuned Yi submissions need to wait for the integration of the model architecture in the latest stable of transformers (just like people waited after the Falcon release for example).

ehartford

Nov 20, 2023

Understood thank you for your help

clefourrier changed discussion status to closed Nov 20, 2023

ehartford

Nov 26, 2023

@clefourrier

https://huggingface.co/ehartford/dolphin-2.2-70b just sitting there in "Running" for a week. I am sure it's not running...

clefourrier

Open LLM Leaderboard org Nov 27, 2023

Hi! If the status did not change in the request file, it's likely it got started/cancelled/started/cancelled by other higher priority jobs on the cluster, since the leaderboard do not have priority against, for example, big model trainings.

clefourrier

Open LLM Leaderboard org Nov 27, 2023

Btw, regarding Yi fine-tunes, since the original authors actually used a llama arch (and converted their weights on the hub to this format now), you just need to do the same to be able to submit on the cluster as we support llama architectures! :)

SaylorTwift

Open LLM Leaderboard org Nov 27, 2023

original authors actually used a llama arch (and converted their weights on the hub to this format now)

Because of that, the Yi architecture will not be added to Transformers so if you want your model to be evaluated you have to convert it to the llama format.