metadata
tags:
- transformers
- xlm-roberta
- eva02
- clip
library_name: transformers
license: cc-by-nc-4.0
Jina CLIP
Core implementation of Jina CLIP. The model uses:
- the EVA 02 architecture for the vision tower
- the Jina XLM RoBERTa with Flash Attention model as a text tower
Models that use this implementation
Requirements
To use the Jina CLIP source code, the following packages are required:
torch
timm
transformers
einops
xformers
to use x-attentionflash-attn
to use flash attentionapex
to use fused layer normalization