Spaces:
Running
on
Zero
Does loras conflict between different users?
Hi, I'm a follower of you. Firstly I want thank you for your excellent works on spaces and model collections. They helped my a lot on my own works. I'm reaching out because I have a question about the implementation of lora loader. I noticed that the lora, both preset ones and custom ones, are loaded in runtime outside ZeroGPU decorator. Does it conflicts when there are multiple concurrents with different lora to be loaded? Since the pipeline is shared between users globally. I'm pretty new to gradio, It will be very helpful to me if you can offer me some hints on this. Thanks in advance.
Thank you. ๐
I'll answer your questions, but there's quite a lot I don't understand.
loaded in runtime outside ZeroGPU decorator
That's for being stingy with Quota.
I think this is ok because it's loaded into VRAM together with the model just before inference.
HF Overall, the Zero GPU space is buggy, otherwise it would be glitchy...
Does it conflicts when there are multiple concurrents with different lora to be loaded? Since the pipeline is shared between users globally.
If you load the base model, it's like fighting for the pipeline...
Well, it's a Zero GPU service to share a limited number of powerful GPUs at a low cost, so, well, it can't be helped.
I think it could happen. However, although there is a lot of waste in the process, to be as certain as possible, the model is loaded and LoRA is reapplied each time just before inference.
Therefore, the situation where LoRAs get mixed up with other people's LoRAs should not happen very often.
But, the PEFT library used for LoRA is like a merging process when it comes down to it, so there is a possibility of cases not being unmerged properly.
Thanks for your reply. After I reviewed the code again, I noticed you called the "change_base_model" in every generation, so the only competition should be on the global pipeline and the only time window that things might got messed up would be before the pipeline loaded into gpu.But that might not happen as well since the pipe is threadsafe so no matter how multiple requests are handled, there should not be any change that different users could mix any lora up or load the wrong base model. Am I right?
As for the PEFT part, I haven't got that far, I am still learning this๐
Am I right?
Yes, yes. That's what I mean.
As for PEFT, it is used inside Diffusers, so I thought you wouldn't find it even if you read it. Well, consider that fuse_lora is.
If you want to explore the relationship between Diffusers and PEFT, the PEFT author's article is interesting. (I forget where๐
)
Thanks, I check that out.
I forgot about the fact that Flux is so huge that I added a simple cache to the model load section.๐
I always unload LoRA, but depending on the PEFT this may malfunction. In some cases I might add an option to turn off the cache.
The surefire workaround in the current situation is to load the base model, whichever it is.
Also, the LoRA stand-alone part of the original HF has a weak PEFT dependency, so unloading that part would be a sure thing.
Note that this problem does not occur in the SDXL space due to its structure.