Edit model card

πŸ‘οΈ GLaMM-RegCap-RefCOCOg


πŸ“ Description

GLaMM-RegCap-VG is the model specific to region-level captioning finetuned on RefCOCOg. "RegCap-RefCOCOg" indicates its specialization in region-level captioning with tuning on the RefCOCOg dataset.

πŸ’» Download

To get started with GLaMM-RegCap-RefCOCOg, follow these steps:

git lfs install
git clone https://huggingface.co/MBZUAI/GLaMM-RegCap-RefCOCOg

πŸ“š Additional Resources

πŸ“œ Citations and Acknowledgments

  @article{hanoona2023GLaMM,
          title={GLaMM: Pixel Grounding Large Multimodal Model},
          author={Rasheed, Hanoona and Maaz, Muhammad and Shaji, Sahal and Shaker, Abdelrahman and Khan, Salman and Cholakkal, Hisham and Anwer, Rao M. and Xing, Eric and Yang, Ming-Hsuan and Khan, Fahad S.},
          journal={ArXiv 2311.03356},
          year={2023}
  }
Downloads last month
6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including MBZUAI/GLaMM-RegCap-RefCOCOg