guenthermi commited on
Commit
64cb362
1 Parent(s): a4ba7b8

update readme

Browse files
Files changed (2) hide show
  1. README.md +34 -15
  2. de_evaluation_results.png +0 -0
README.md CHANGED
@@ -3,7 +3,6 @@ tags:
3
  - sentence-transformers
4
  - feature-extraction
5
  - sentence-similarity
6
- - mteb
7
  language:
8
  - de
9
  - en
@@ -3109,7 +3108,7 @@ model-index:
3109
  <br><br>
3110
 
3111
  <p align="center">
3112
- <img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications." width="150px">
3113
  </p>
3114
 
3115
 
@@ -3117,6 +3116,9 @@ model-index:
3117
  <b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
3118
  </p>
3119
 
 
 
 
3120
 
3121
  ## Intended Usage & Model Info
3122
 
@@ -3135,13 +3137,17 @@ Des Weiteren stellen wir folgende Embedding-Modelle bereit:
3135
 
3136
  - [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters.
3137
  - [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
3138
- - [`jina-embeddings-v2-base-zh`](): Chinese-English Bilingual embeddings (soon).
3139
- - [`jina-embeddings-v2-base-de`](): German-English Bilingual embeddings (soon) **(you are here)**.
3140
- - [`jina-embeddings-v2-base-es`](): Spanish-English Bilingual embeddings (soon).
 
 
 
3141
 
3142
  ## Data & Parameters
3143
 
3144
- Jina Embeddings V2 [technical report](https://arxiv.org/abs/2310.19923)
 
3145
 
3146
  ## Usage
3147
 
@@ -3204,9 +3210,29 @@ embeddings = model.encode(
3204
  )
3205
  ```
3206
 
3207
- ## Fully-managed Embeddings Service
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3208
 
3209
- Alternatively, you can use Jina AI's [Embedding platform](https://jina.ai/embeddings/) for fully-managed access to Jina Embeddings models.
 
 
3210
 
3211
  ## Use Jina Embeddings for RAG
3212
 
@@ -3216,13 +3242,6 @@ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/b
3216
 
3217
  <img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
3218
 
3219
-
3220
- ## Plans
3221
-
3222
- 1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese.
3223
- 2. Multimodal embedding models enable Multimodal RAG applications.
3224
- 3. High-performt rerankers.
3225
-
3226
  ## Contact
3227
 
3228
  Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
 
3
  - sentence-transformers
4
  - feature-extraction
5
  - sentence-similarity
 
6
  language:
7
  - de
8
  - en
 
3108
  <br><br>
3109
 
3110
  <p align="center">
3111
+ <img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="Jina AI logo: Jina AI is your Portal to Multimodal AI" width="150px">
3112
  </p>
3113
 
3114
 
 
3116
  <b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
3117
  </p>
3118
 
3119
+ ## Quick Start
3120
+
3121
+ The easiest way to starting using `jina-embeddings-v2-base-de` is to use Jina AI's [Embedding API](https://jina.ai/embeddings/).
3122
 
3123
  ## Intended Usage & Model Info
3124
 
 
3137
 
3138
  - [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters.
3139
  - [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
3140
+ - [`jina-embeddings-v2-base-zh`](https://huggingface.co/jinaai/jina-embeddings-v2-base-zh): 161 million parameters Chinese-English Bilingual embeddings.
3141
+ - [`jina-embeddings-v2-base-de`](https://huggingface.co/jinaai/jina-embeddings-v2-base-de): 161 million parameters German-English Bilingual embeddings **(you are here)**.
3142
+ - _[`jina-embeddings-v2-base-es`](): Spanish-English Bilingual embeddings (soon)._
3143
+ - _Bilingual embedding models in other world languages (soon)._
3144
+ - _Multimodal-input embedding model (soon)._
3145
+ - _High-performing reranking model (soon)._
3146
 
3147
  ## Data & Parameters
3148
 
3149
+ We will publish a report with technical details about the training of the bilingual models soon.
3150
+ The training of the English model is described in this [technical report](https://arxiv.org/abs/2310.19923).
3151
 
3152
  ## Usage
3153
 
 
3210
  )
3211
  ```
3212
 
3213
+ If you want to use the model together with the [sentence-transformers package](https://github.com/UKPLab/sentence-transformers/), make sure that you have installed the latest release and set `trust_remote_code=True` as well:
3214
+
3215
+ ```
3216
+ !pip install -U sentence-transformers
3217
+ from sentence_transformers import SentenceTransformer
3218
+ from numpy.linalg import norm
3219
+
3220
+ cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
3221
+ model = SentenceTransformer('jinaai/jina-embeddings-v2-base-de', trust_remote_code=True)
3222
+ embeddings = model.encode(['How is the weather today?', 'Wie ist das Wetter heute?'])
3223
+ print(cos_sim(embeddings[0], embeddings[1]))
3224
+ ```
3225
+
3226
+ ## Alternatives to Using Transformers Package
3227
+
3228
+ 1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
3229
+ 2. _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on [AWS Sagemaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy).
3230
+
3231
+ ## Benchmark Results
3232
 
3233
+ We evaluated our Bilingual model on all German and English evaluation tasks availble on the [MTEB benchmark](https://huggingface.co/blog/mteb). In addition, we evaluated the models agains a couple of other German, English, and multilingual models on additional German evaluation tasks:
3234
+
3235
+ <img src="de_evaluation_results.png" width="780px">
3236
 
3237
  ## Use Jina Embeddings for RAG
3238
 
 
3242
 
3243
  <img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
3244
 
 
 
 
 
 
 
 
3245
  ## Contact
3246
 
3247
  Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
de_evaluation_results.png ADDED