Update README.md
Browse files
README.md
CHANGED
@@ -8,35 +8,24 @@ inference: false
|
|
8 |
license: apache-2.0
|
9 |
---
|
10 |
<!-- TODO: add evaluation results here -->
|
11 |
-
<br><br>
|
12 |
-
|
13 |
<p align="center">
|
14 |
<img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications." width="150px">
|
15 |
</p>
|
16 |
|
17 |
-
|
18 |
-
|
19 |
-
<b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
|
20 |
-
</p>
|
21 |
|
22 |
## Quick Start
|
23 |
|
24 |
-
The easiest way to starting using `jina-clip-v1` is to use Jina AI
|
25 |
|
26 |
## Intended Usage & Model Info
|
27 |
|
28 |
-
`jina-clip-v1` is
|
29 |
|
30 |
-
Traditional text embedding models, such as [jina-embeddings-v2-base-en](https://huggingface.co/jinaai/jina-embeddings-v2-base-en),
|
31 |
-
excel in text-to-text retrieval but lack cross-modal retrieval capabilities.
|
32 |
-
Conversely, CLIP-like models, such as [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32),
|
33 |
-
align image embeddings with text embeddings but underperform in text-to-text retrieval due to their training methodology and context length limitations.
|
34 |
|
35 |
-
`jina-clip-v1`
|
36 |
-
Its text component achieves comparable performance to `jina-embeddings-v2-base-en` in text-to-text retrieval,
|
37 |
-
while the overall model delivers state-of-the-art performance in cross-modal retrieval tasks.
|
38 |
-
This makes it an ideal choice for multimodal retrieval-augmented generation (M-RAG) applications,
|
39 |
-
allowing for both text-to-text and text-to-image searches with a single model.
|
40 |
|
41 |
|
42 |
## Data & Parameters
|
|
|
8 |
license: apache-2.0
|
9 |
---
|
10 |
<!-- TODO: add evaluation results here -->
|
|
|
|
|
11 |
<p align="center">
|
12 |
<img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications." width="150px">
|
13 |
</p>
|
14 |
|
15 |
+
# jina-clip-v1
|
16 |
+
Jina CLIP: your CLIP model is also your text retriever!
|
|
|
|
|
17 |
|
18 |
## Quick Start
|
19 |
|
20 |
+
The easiest way to starting using `jina-clip-v1` is to use Jina AI [Embedding API](https://jina.ai/embeddings/).
|
21 |
|
22 |
## Intended Usage & Model Info
|
23 |
|
24 |
+
`jina-clip-v1` is a state-of-the-art English **multimodal (text-image) embedding model**.
|
25 |
|
26 |
+
Traditional text embedding models, such as [jina-embeddings-v2-base-en](https://huggingface.co/jinaai/jina-embeddings-v2-base-en), excel in text-to-text retrieval but fall short in cross-modal tasks. In contrast, models like [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) effectively align image and text embeddings but are not optimized for text-to-text retrieval due to their training methodologies and context limitations.
|
|
|
|
|
|
|
27 |
|
28 |
+
`jina-clip-v1` bridges this gap by offering robust performance in both domains. Its text component matches the retrieval efficiency of `jina-embeddings-v2-base-en`, while its overall architecture sets a new benchmark for cross-modal retrieval. This dual capability makes it an excellent tool for multimodal retrieval-augmented generation (M-RAG) applications, enabling seamless text-to-text and text-to-image searches within a single model.
|
|
|
|
|
|
|
|
|
29 |
|
30 |
|
31 |
## Data & Parameters
|