Spaces:

DiscloseAI
/

pllava-7b-demo

Running on A10G

App Files Files Community

Apply for community grant: Academic project (gpu)

by ermu2001 - opened May 1

Discussion

ermu2001

Disclose.AI org May 1

Video LLM Pllava https://pllava.github.io/

hysts

May 2

Hi @ermu2001 , we have assigned a10g-large to this Space with shorter sleep time as it was the hardware you were using.

As we recently started using ZeroGPU as the default hardware for grants, it would be nice if you can check the compatibility and usage sections of this page and see if you can migrate your Space to use ZeroGPU.

I just send you an invitation to join the ZeroGPU explorers org, and once you join the org, you should be able to see the "Zero Nvidia A100" option in your Space Settings. You can test ZeroGPU by duplicating this Space privately and assign ZeroGPU to it yourself. If you could confirm that your Space can run on Zero, you can update this main Space to use Zero and delete the private duplicate used for testing.

hysts

May 2

Oh, BTW, the latest gr.ChatInterface supports multimodal=True option and it might be useful for your demo. https://www.gradio.app/docs/gradio/chatinterface
Example: https://huggingface.co/spaces/merve/llava-next

ermu2001

Disclose.AI org May 3

I got it work on ZeroGPU at https://huggingface.co/spaces/DiscloseAI/pllava-13b-demo, incredibly simple to switch to ZeroGPU, nice framework!

Except using ZeroGPU seems a bit slow comparing to exclusive GPU🥹

hysts

May 3

@ermu2001 Thanks for checking ZeroGPU!

Except using ZeroGPU seems a bit slow comparing to exclusive GPU🥹

Isn't it simply because you are running 13B model in the ZeroGPU Space while running 7B model in this A10G Space?
https://huggingface.co/spaces/DiscloseAI/pllava-7b-demo/blob/afc99d0c045af6fa81e3c662e72ff2d08b9df88a/app.py#L4
https://huggingface.co/spaces/DiscloseAI/pllava-13b-demo/blob/3660ce46ec214512ddd5f52db635fe8cc957036c/app.py#L4

Or maybe it's because the overhead of moving model to GPU, but the model is not offloaded to CPU for a while after execution, so if many people keep visiting this Space, the overhead can be negligible.
Also, we can set longer sleep time for ZeroGPU Spaces because multiple ZeroGPU Spaces share the hardware. Usually, the sleep time of ZeroGPU Space is set to 48 hours, but we set it to 1 hour or less for a normal grant Spaces. When the Space goes sleep, users can restart it, but the time it takes to restart it is much longer than the waiting time to load the model to GPU on ZeroGPU.
Furthermore, ZeroGPU can use multiple backend GPU servers behind the load balancer, so multiple people can run the Space at the same time, but the feature is not available for a normal GPU grants.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment