gordonhu/MQT-LLaVA · Apply for community grant: Academic project (gpu)

Owner May 31

•

We are students from the UCLA NLP group and this demo is for our paper "Matryoshka Query Transformer for Large Vision-Language Models" [https://arxiv.org/abs/2405.19315]. This demo needs 24G memory to run. 1XL4 can work for us.

hysts

May 31

•

edited May 31

Hi @gordonhu , we have assigned 1XL4 to this Space for now.
BTW, would it be possible to migrate this Space to use ZeroGPU, like this LLaVA-NeXT Space? We recently started using ZeroGPU (docs) as the default hardware for grants, but I don't think your current code is compatible with ZeroGPU, while the LLaVA-NeXT Space uses transformers and is compatible with ZeroGPU. Using ZeroGPU would reduce our infra cost in the long run and improve UX (longer sleep time, parallel execution, etc.), so it would be nice if you can consider migrating your Space to ZeroGPU.
Also, BTW, gr.ChatInterface now supports multimodal=True, so it would be nice if you could use it.

hysts

May 31

Looks like your Space repo contains model weights here, but it would be better if you could download it at startup using the huggingface_hub library. (You can use hf_hub_download or snapshot_download (docs)). When you add your model weights to your Space repo, it would make the Docker image larger, which slows down startup, and usually it's faster to download models at startup using huggingface_hub.

gordonhu

Owner May 31

Thank you so much for your fast reply and many great suggestions !!! We will incorporates LLaVA-Next in our next version so that it can be compatible with ZeroGPU. As for the model weight, I'll update it right now using hf_hub_download !!! Thank you !