Spaces:

badayvedat
/

LLaVA

Running on T4

App Files Files Community

Apply for community grant: Academic project (gpu)

by badayvedat - opened Oct 8, 2023

Discussion

badayvedat

Owner Oct 8, 2023

•

edited Oct 9, 2023

Hi,
The model fails to load into the GPU with an Nvidia 10G small and other GPUs with smaller RAM + VRAM. It would be great if we could get a Nvidia 10G large or a larger GPU.

Disclaimer: This project is not mine, the credit for this amazing work goes to @liuhaotian et al, I just integrated their gradio demo into this huggingface space

ysharma

Oct 10, 2023

This is a brilliant demo @badayvedat !
We wanted to let you know that we've assigned a GPU to your space, and your GPU grant application has been approved. Congratulations! Please keep in mind that GPU grants are provided on a temporary basis and may be removed if usage is very low.
To learn more about GPUs in Spaces, please check out https://huggingface.co/docs/hub/spaces-gpus. We look forward to seeing the innovative work you produce with this grant. If you have any questions or concerns, please let us know. Thank you for your interest in our platform!

hysts

Oct 10, 2023

Hi @badayvedat
This Space is currently running the 7B model on A10G large. But can you update your code to load the model in 4bit (or 8bit)?
I've tested only on my AWS environment with T4, but it's possible to run the 7B model on T4 if we load it in 8bit, and even the 13B model can run on T4 if loaded in 4bit. This is mentioned in the README.md of the original repo.

Also, regarding the CPU RAM, when I load the 7B model in 8bit or the 13B model in 4bit, the maximum CPU RAM usage seems to be less than 15GB in both cases. I think you can pass this kwargs to LlavaLlamaForCausalLM.from_pretrained here.

badayvedat

Owner Oct 10, 2023

hey @ysharma @hysts !
with the help of the @liuhaotian we've decreased the memory requirements for the space using 4 bits inference and removing the model preload, and now it works in a T4-Small machine 🎉