Runtime error ZeroGPU has not been initialized

#8
by kadirnar - opened
ZeroGPU Explorers org
ZeroGPU Explorers org

@kadirnar You need to decorate your function that uses GPU.

ZeroGPU Explorers org

Sorry, it's my fault. I fixed it.

kadirnar changed discussion status to closed
ZeroGPU Explorers org

@hysts
How can I add this code to req.txt file?

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "- -build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git

I am getting errors due to parameters.

kadirnar changed discussion status to open
ZeroGPU Explorers org

@kadirnar CUDA is not available at build time or at startup in ZeroGPU. I'm not sure this works for your case, but as a potential workaround, you can try building a wheel for the package in your local environment with CUDA and installing it at startup. Currently, as the CUDA version of Spaces is CUDA 11.8, you might want to use nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 to build a wheel.

Below are examples where this approach worked:
https://huggingface.co/spaces/stabilityai/TripoSR/discussions/1
https://huggingface.co/spaces/ashawkey/LGM/discussions/5
https://huggingface.co/spaces/OpenGVLab/VideoMamba/discussions/2

ZeroGPU Explorers org

@kadirnar CUDA is not available at build time or at startup in ZeroGPU. I'm not sure this works for your case, but as a potential workaround, you can try building a wheel for the package in your local environment with CUDA and installing it at startup. Currently, as the CUDA version of Spaces is CUDA 11.8, you might want to use nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 to build a wheel.

Below are examples where this approach worked:
https://huggingface.co/spaces/stabilityai/TripoSR/discussions/1
https://huggingface.co/spaces/ashawkey/LGM/discussions/5
https://huggingface.co/spaces/OpenGVLab/VideoMamba/discussions/2

ZeroGPU feature only supports gradio. How can I do it if I create the docker file?

ZeroGPU Explorers org

@kadirnar Sorry, I don't understand what you mean. As you said, ZeroGPU only works with gradio, so you can't use docker as sdk.

ZeroGPU Explorers org

I couldn't add the Apex library. But the demo works. But it gives this error.
image.png

ZeroGPU Explorers org

I couldn't add the Apex library. But the demo works. But it gives this error.

Not sure, but maybe it's because your function takes longer than the default time limit, which is 60 seconds. Each user has a few minutes worth of inference time per a few hours, and there's a time limit for a single run too. You can change the latter with duration parameter of @spaces.GPU, like @spaces.GPU(duration=120).

ZeroGPU Explorers org

BTW, regarding

ZeroGPU feature only supports gradio. How can I do it if I create the docker file?

@kadirnar Sorry, I don't understand what you mean. As you said, ZeroGPU only works with gradio, so you can't use docker as sdk.

I just realized but I guess you misunderstood what I meant by this:

you can try building a wheel for the package in your local environment with CUDA and installing it at startup. Currently, as the CUDA version of Spaces is CUDA 11.8, you might want to use nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 to build a wheel.

What I was trying to say was that you might want to use nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 docker image to build a wheel in your local environment.

ZeroGPU Explorers org

I couldn't add the Apex library. But the demo works. But it gives this error.

Not sure, but maybe it's because your function takes longer than the default time limit, which is 60 seconds. Each user has a few minutes worth of inference time per a few hours, and there's a time limit for a single run too. You can change the latter with duration parameter of @spaces.GPU, like @spaces.GPU(duration=120).

I added the Duration parameter and set its value to 500. I don't get errors anymore. Here is my new error😅:

Error:

You have exceeded your GPU quota (279s left vs. 500s requested). Please retry in 3:22:04
ZeroGPU Explorers org

@kadirnar You shouldn't set the duration parameter too large. It should be as close as the maximum expected time that the decorated inference function actually needs. As I said, each users has limited amount of quota of inference time on ZeroGPU, which I think is 300 seconds as of now, and duration sets the maximum expected time of the inference function, so if you set it to 500 seconds, it means that users might use 500 seconds of their quota, but as users only have a few minutes worth of it, users cannot run it with the error.

ZeroGPU Explorers org

@kadirnar You shouldn't set the duration parameter too large. It should be as close as the maximum expected time that the decorated inference function actually needs. As I said, each users has limited amount of quota of inference time on ZeroGPU, which I think is 300 seconds as of now, and duration sets the maximum expected time of the inference function, so if you set it to 500 seconds, it means that users might use 500 seconds of their quota, but as users only have a few minutes worth of it, users cannot run it with the error.

Yes, I reduced this value(280). But I don't understand. I am the developer of the repo. I need to test the parameters. I have to make optimizations. But I keep getting this error. The Community GPU feature was better.

Error
You have exceeded your GPU quota (165s left vs. 280s requested). Please retry in 0:52:59

ZeroGPU Explorers org

Yes, I reduced this value(280). But I don't understand. I am the developer of the repo. I need to test the parameters. I have to make optimizations. But I keep getting this error. The Community GPU feature was better.

So your Space requires 280 seconds for a single run? If that's the case, ZeroGPU may not be suitable for your Space.

ZeroGPU Explorers org

@kadirnar
I took a look at your Space and noticed that you are downloading the model in the inference function. https://huggingface.co/spaces/kadirnar/Open-Sora/blob/0ba99c24cebe3f9c058260cc5b40a2a339fb937b/app.py#L35-L46
Obviously, it takes some time to download models at the first run, so I think you should do it outside of the function. Also, I don't think this works with ZeroGPU. Furthermore, it would load the model every time you call the function, so I recommend you load the model first outside of the function and directly run it without calling other scripts. It should reduce the inference time.

ZeroGPU Explorers org

Thank you. I fixed it.

kadirnar changed discussion status to closed

Sign up or log in to comment