tokenize-anything / README.md
Ting PAN
Add gitignore
eda12af
|
raw
history blame
No virus
2.79 kB

Tokenize Anything via Prompting

Ting Pan1,2*,   Lulu Tang2*,   Xinlong Wang,   Shiguang Shan1

1ICT-CAS,   2BAAI
* Equal Contribution, Project Lead

We present Tokenize Anything via Prompting, a unified and promptable model capable of simultaneously segmenting, recognizing, and captioning objects within arbitrary regions, only relaying on visual prompts (point, box and sketch). The model is trained with exhaustive segmentation masks sourced from SA-1B, coupled with semantic priors from a pre-trained EVA-CLIP with 5 billion parameters.

Installation

See Github Page.

Models

Model weights

Two versions of the model are available with different image encoders.

Model Description Weights
tap_vit_l ViT-L TAP model 🤗 HF link
tap_vit_b ViT-B TAP model 🤗 HF link

Concept weights

Note: You can generate these weights following the Concept Guide.

Concept Description Weights
Merged-2560 Merged concepts 🤗 HF link
LVIS-1203 LVIS concepts 🤗 HF link
COCO-80 COCO concepts 🤗 HF link

License

Apache License 2.0

Citation

@article{pan2023tap,
  title={Tokenize Anything via Prompting},
  author={Pan, Ting and Tang, Lulu and Wang, Xinlong and Shan, Shiguang},
  journal={arXiv preprint arXiv:2312.yyyyy},
  year={2023}
}

Acknowledgement

We thank the repositories: SAM, EVA, LLaMA, FlashAttention, Gradio, Detectron2 and CodeWithGPU.