haotiz/glip-zeroshot-demo · Apply for community grant: Academic project

Owner Dec 21, 2022

GLIP (Grounded Language-Image Pre-training) is a generalizable object detection (we use object detection as the representative of localization tasks) model. It is language aware, taking a natural language prompt as instruction. It is also semantic rich, able to detect millions of visual concepts out-of-box. GLIPv2 further extends such ability to instance segmentation and grounded vision-language understanding tasks; see examples in Figure 2. GLIP introduces language into object detection and leverages self-training techniques to pre-train on scalable and semantic-rich data: grounded image-captions (24M). This marks a milestone towards generalizable localization models GLIP enjoys superior zero-shot and few-shot transfer ability similar to that of CLIP/GPT-2/GPT-3.

Github: https://github.com/microsoft/GLIP
Hugging Face Demo: https://huggingface.co/spaces/haotiz/glip-zeroshot-demo
CVPR 2022 Best Paper Finalist (1 out of 33)
Microsoft Research Blog: https://www.microsoft.com/en-us/research/project/project-florence-vl/articles/object-detection-in-the-wild-via-grounded-language-image-pre-training/
Twitter: https://twitter.com/HaotianZhang4AI/status/1569775843681128448

Please consider granting free T4 access to this project. We hope the open-source and radio demo will further arouse the interests in multimodal intelligence and vision-language research.

akhaliq

Dec 22, 2022

Hey, a GPU Grant was just provided. Note that GPU Grants are provided temporarily and might be removed after some time if the usage is very low.

To learn more about GPUs in Spaces, please check out https://huggingface.co/docs/hub/spaces-gpus

emanuelevivoli

Sep 25, 2023

•

edited Sep 25, 2023

Grant expired?