Does Yi models have eval issues?

#393
by migtissera - opened

Hey! My Tess-Medium-200K-v1.0 model (renamed to Tess-M-Creative-v1.0) have been running for 2 days now. Is there an error? It hasn't failed, it's still in the "Running Evaluation" queue.

This is the model: https://huggingface.co/migtissera/Tess-M-Creative-v1.0

Hugging Face H4 org
edited Nov 21, 2023

Hi!
Could you please link to the request file? There are many reasons why this could have happened :)

Hugging Face H4 org

The specific model you linked was cancelled (because a more important job was launched on the cluster) and automatically requeued - we're apparently still having a small issue with our display.

Hugging Face H4 org

Hi ! Your model actually failed because of network error on our side, I will re-add it to the queue, thanks for your patience :)

Thank you @SaylorTwift

Hugging Face H4 org
edited Nov 22, 2023

Hi @migtissera ,
You'll find the request files for your models here. If you point to them next time, we'll be able to debug your problems faster, as they contain the job id which allows us to look at the logs :)
The first model failed because the launching system had a problem launching this big a model (we will need to launch it on multiple nodes).
The other two were cancelled for priority reasons, I'll let @SaylorTwift tell you if they were rescheduled (and relaunch them if not) - the cluster is very full at the moment though, so it might take some time before they are evaluated.

Thanks @clefourrier ! I didn't know where to find this.. Will do this from next one onwards.. :)

Hugging Face H4 org

I'm going to close this issue for now, feel free to reopen if needed

clefourrier changed discussion status to closed

Sign up or log in to comment