llizhx/TinyGPT-V · Apply for community grant: Academic project (gpu and storage)

Owner Jan 3, 2024

In the era of advanced multimodel learning, multimodal large language models (MLLMs) such as GPT-4V have made remarkable strides towards bridging language and visual elements. However, the closed-source nature and considerable computational demand present notable challenges for universal usage and modifications. This is where open-source MLLMs like LLaVA and MiniGPT-4 come in, presenting groundbreaking achievements across tasks. Despite these accomplishments, computational efficiency remains an unresolved issue, as these models, like LLaVA-v1.5-13B, require substantial resources. Addressing these issues, we introduce TinyGPT-V, a new-wave model marrying impressive performance with commonplace computational capacity. It stands out by requiring merely a 24G GPU for training and an 8G GPU or CPU for inference. Built upon Phi-2, TinyGPT-V couples an effective language backbone with pre-trained vision modules from BLIP-2 or CLIP. TinyGPT-V's 2.8B parameters can undergo a unique quantisation process, suitable for local deployment and inference tasks on 8G various devices. Our work fosters further developments for designing cost-effective, efficient, and high-performing MLLMs, expanding their applicability in a broad array of real-world scenarios. Furthermore this paper proposed a new paradigm of Multimodal Large Language Model via small backbones.

llizhx

Owner Jan 3, 2024

I am the author of TinyGPT-V, and I hope more people can try and see the potential of our research result.

hysts

Jan 3, 2024

Hi @llizhx , we have assigned a gpu to this space. Note that GPU Grants are provided temporarily and might be removed after some time if the usage is very low.

To learn more about GPUs in Spaces, please check out https://huggingface.co/docs/hub/spaces-gpus

llizhx

Owner Jan 3, 2024

Thank you very much!