`.then` does not work inside of Zero-GPU

#124
by WillHeld - opened

.then events don't appear to get triggered in Zero GPU environments, similar to the related issue of cancel events not getting triggered: https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/113

In my testing, it seems like there's some race condition induced in the ZeroGPU environment that doesn't exist when running on your own servers.

So, does this mean that .success will work because it does not cause a race condition?
I've never noticed this because I generally use .success for heavy processing.

Like the global variable issue, this is probably due to multi-processing or multi-threading in the spaces library...
More to the point, if someone uses that in a library, Python is buggy at a high rate.

.success didn't work for me either! I'll create a simple repro for both. It's possible it only happens with generator methods, which have a different wrapper in ZeroGPU than regular functions though!

.success didn't work for me either!

It's barely working in my space. Specifically, I'm using .success to rename the file after generating the image. The function for generating is the yield version, and the function for renaming is the return version.
https://huggingface.co/spaces/John6666/DiffuseCraftMod

It's possible it only happens with generator methods,

I'm pretty sure that using yield makes it more buggy in Zero GPU. I've run into problems in other spaces that I now think I did, and I had no choice but to return to avoid the problem. For some functions, it's fine to use either, but there are some functions that need to be a generator...

Edit:
A program works in my environment and not in yours. That rarely happens with VMs.
There may be some other condition that we haven't seen yet.

Yeah - I'm actually wondering whether it might not be that then doesn't work but instead that an error is getting swallowed by the forked process so not showing me what's wrong! Closing until I can rule that out.

WillHeld changed discussion status to closed

Confirmed that was the case - I was unintentionally returning a cuda tensor in the final yield. The error just didn't get logged because the process got forked, so it made both .then and .success not get triggered.

Oh I see. So that's how it was.
In my case, the problem didn't occur because I only returned Python text and image objects.

Sign up or log in to comment