Dolphin models disappearing / not eval

#381
by ehartford - opened

dolphin-2.2-70b and dolphin-2_2-yi-34b not working

Open LLM Leaderboard org

Hi!
Could you please follow the steps in the FAQ? There is absolutely nothing I can do if you just tell me "model not working", as I can't just magically guess what the problem you are encountering is.

  • Are you having trouble submitting your model? In which case, are you getting an error message? What are the full parameters you used?
  • Was your model properly submitted? If yes, can you link the request file so we can get more info on the current status?

Hi!
Could you please follow the steps in the FAQ? There is absolutely nothing I can do if you just tell me "model not working", as I can't just magically guess what the problem you are encountering is.

  • Are you having trouble submitting your model? In which case, are you getting an error message? What are the full parameters you used?
  • Was your model properly submitted? If yes, can you link the request file so we can get more info on the current status?

Hello,

he's referring to the failed evaluations of these models: https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/ehartford/dolphin-2_2-yi-34b_eval_request_False_4bit_Original.json
and
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/ehartford/dolphin-2.2-70b_eval_request_False_bfloat16_Original.json

Thank you!

Open LLM Leaderboard org

Hi @Dampfinchen ,
Thank you very much for the details!

Long story short, we have a new system to run our evaluations on the HF cluster, where the leaderboard evaluations get cancelled automatically if a higher priority job needs resources (since the leaderboard is only running on the spare cycles of the cluster). The jobs get relaunched automatically in the end, but they get displayed as failed in the meantime. Your jobs will be relaunched automatically when we have enough compute available :)

We're working on improving the display on this as it is very confusing for everyone (us included ^^').

Closing for now, but feel free to reopen if your models are still not evaluated in a week.

clefourrier changed discussion status to closed

So do I need to resubmit ehartford/dolphin-2.2-70b and ehartford/dolphin-2_2-yi-34b?

I'm confused why my models aren't getting evaluated. Did I do something wrong?

ehartford changed discussion status to open
Open LLM Leaderboard org

Hi!
Thank you for your attention about this!
ehartford/dolphin-2.2-70b failed because of a node failure, I'm adding it back to the queue.
However, your other model dolphin-2_2-yi-34b failed because it couldn't load the YiTokenizer - does it require trust_remote_code=True?

Yes it does require trust_remote_code=True

Open LLM Leaderboard org

Ha! As we don't allow ŧrust_remote_code=True for general models submissions on the leaderboard for safety reasons, as indicated in the FAQ (as it would require us to check manually the code of all new models), we won't run this model. I'll investigate how you were able to submit it - the leaderboard should prevent the submission of these models.

You find that an appropriate resolution?

In other words - no finetunes of Yi will be evaluated

Open LLM Leaderboard org
edited Nov 20, 2023

In other words - no finetunes of Yi will be evaluated

Indeed. We don't support trust_remote_code=True for safety reasons, and fine-tuned Yi submissions need to wait for the integration of the model architecture in the latest stable of transformers (just like people waited after the Falcon release for example).

Understood thank you for your help

clefourrier changed discussion status to closed

@clefourrier

https://huggingface.co/ehartford/dolphin-2.2-70b just sitting there in "Running" for a week. I am sure it's not running...

Open LLM Leaderboard org

Hi! If the status did not change in the request file, it's likely it got started/cancelled/started/cancelled by other higher priority jobs on the cluster, since the leaderboard do not have priority against, for example, big model trainings.

Open LLM Leaderboard org

Btw, regarding Yi fine-tunes, since the original authors actually used a llama arch (and converted their weights on the hub to this format now), you just need to do the same to be able to submit on the cluster as we support llama architectures! :)

Open LLM Leaderboard org

original authors actually used a llama arch (and converted their weights on the hub to this format now)

Because of that, the Yi architecture will not be added to Transformers so if you want your model to be evaluated you have to convert it to the llama format.

Sign up or log in to comment