Jina CLIP
Core implementation of Jina CLIP. The model uses:
- the EVA 02 architecture for the vision tower
- the Jina XLM RoBERTa with Flash Attention model as a text tower
Models that use this implementation
Requirements
To use the Jina CLIP source code, the following packages are required:
torch
timm
transformers
einops
xformers
to use x-attentionflash-attn
to use flash attentionapex
to use fused layer normalization
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.