hanxiao commited on
Commit
e7cdc21
1 Parent(s): 24a67ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -17
README.md CHANGED
@@ -8,35 +8,24 @@ inference: false
8
  license: apache-2.0
9
  ---
10
  <!-- TODO: add evaluation results here -->
11
- <br><br>
12
-
13
  <p align="center">
14
  <img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications." width="150px">
15
  </p>
16
 
17
-
18
- <p align="center">
19
- <b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
20
- </p>
21
 
22
  ## Quick Start
23
 
24
- The easiest way to starting using `jina-clip-v1` is to use Jina AI's [Embedding API](https://jina.ai/embeddings/).
25
 
26
  ## Intended Usage & Model Info
27
 
28
- `jina-clip-v1` is an English, monolingual **multimodal (text-image) embedding model**.
29
 
30
- Traditional text embedding models, such as [jina-embeddings-v2-base-en](https://huggingface.co/jinaai/jina-embeddings-v2-base-en),
31
- excel in text-to-text retrieval but lack cross-modal retrieval capabilities.
32
- Conversely, CLIP-like models, such as [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32),
33
- align image embeddings with text embeddings but underperform in text-to-text retrieval due to their training methodology and context length limitations.
34
 
35
- `jina-clip-v1` is an innovative **multimodal embedding model**.
36
- Its text component achieves comparable performance to `jina-embeddings-v2-base-en` in text-to-text retrieval,
37
- while the overall model delivers state-of-the-art performance in cross-modal retrieval tasks.
38
- This makes it an ideal choice for multimodal retrieval-augmented generation (M-RAG) applications,
39
- allowing for both text-to-text and text-to-image searches with a single model.
40
 
41
 
42
  ## Data & Parameters
 
8
  license: apache-2.0
9
  ---
10
  <!-- TODO: add evaluation results here -->
 
 
11
  <p align="center">
12
  <img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications." width="150px">
13
  </p>
14
 
15
+ # jina-clip-v1
16
+ Jina CLIP: your CLIP model is also your text retriever!
 
 
17
 
18
  ## Quick Start
19
 
20
+ The easiest way to starting using `jina-clip-v1` is to use Jina AI [Embedding API](https://jina.ai/embeddings/).
21
 
22
  ## Intended Usage & Model Info
23
 
24
+ `jina-clip-v1` is a state-of-the-art English **multimodal (text-image) embedding model**.
25
 
26
+ Traditional text embedding models, such as [jina-embeddings-v2-base-en](https://huggingface.co/jinaai/jina-embeddings-v2-base-en), excel in text-to-text retrieval but fall short in cross-modal tasks. In contrast, models like [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) effectively align image and text embeddings but are not optimized for text-to-text retrieval due to their training methodologies and context limitations.
 
 
 
27
 
28
+ `jina-clip-v1` bridges this gap by offering robust performance in both domains. Its text component matches the retrieval efficiency of `jina-embeddings-v2-base-en`, while its overall architecture sets a new benchmark for cross-modal retrieval. This dual capability makes it an excellent tool for multimodal retrieval-augmented generation (M-RAG) applications, enabling seamless text-to-text and text-to-image searches within a single model.
 
 
 
 
29
 
30
 
31
  ## Data & Parameters