Feature Extraction
Transformers
Safetensors
caip
image-feature-extraction
clip
siglip
vision-language
custom_code
Instructions to use yuvansharma/caip-vitl256 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use yuvansharma/caip-vitl256 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="yuvansharma/caip-vitl256", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("yuvansharma/caip-vitl256", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
CAIP (Contrastive Action-Image Pre-training)
from transformers import AutoModel, AutoProcessor
from PIL import Image
import torch
REPO = "yuvansharma/caip-vitl256"
model = AutoModel.from_pretrained(REPO, trust_remote_code=True).eval()
processor = AutoProcessor.from_pretrained(REPO, trust_remote_code=True)
image = Image.open("example.png").convert("RGB")
inputs = processor(images=image, text="pick up the red cup", return_tensors="pt")
with torch.no_grad():
out = model(**inputs)
# out.image_pooled [B, 1024] text-conditioned pooled image embedding
# out.patch_features [B, 256, 1024] patch tokens
# out.text_tokens [B, 64, 1024] text token embeddings
# out.text_pooled [B, 1024] pooled text embedding
For a smaller download (~1.75 GB), load the bf16 weights with revision="bf16".
Citation
@misc{TODO,
}
- Downloads last month
- 23