Update README.md
Browse files
README.md
CHANGED
@@ -288,10 +288,11 @@ The successor of German_Semantic_STS_V2 is here!
|
|
288 |
|
289 |
## Major updates and USPs:
|
290 |
|
291 |
-
- **Sequence length:** 8192, (16 times more than V2 and other models)
|
292 |
- **Matryoshka Embeddings:** The model is trained for embedding sizes from 1024 down to 64, allowing you to store much smaller embeddings with little quality loss.
|
293 |
- **License:** Apache 2.0
|
294 |
-
- **German only:** This model is German-only, causing the model to learn more efficient
|
|
|
295 |
- **Flexibility:** Trained with flexible sequence-length and embedding truncation, flexibility is a core feature of the model, while improving on V2-performance.
|
296 |
|
297 |
## Usage:
|
@@ -300,7 +301,7 @@ The successor of German_Semantic_STS_V2 is here!
|
|
300 |
from sentence_transformers import SentenceTransformer
|
301 |
|
302 |
|
303 |
-
matryoshka_dim = 1024 # How big your embeddings should be, choose from: 64, 128, 256, 512, 1024
|
304 |
model = SentenceTransformer("aari1995/German_Semantic_V3", trust_remote_code=True, truncate_dim=matryoshka_dim)
|
305 |
|
306 |
# model.truncate_dim = 64 # truncation dimensions can also be changed after loading
|
|
|
288 |
|
289 |
## Major updates and USPs:
|
290 |
|
291 |
+
- **Sequence length:** 8192, (16 times more than V2 and other models) -> thanks to the ALiBi implementation of Jina-Team!
|
292 |
- **Matryoshka Embeddings:** The model is trained for embedding sizes from 1024 down to 64, allowing you to store much smaller embeddings with little quality loss.
|
293 |
- **License:** Apache 2.0
|
294 |
+
- **German only:** This model is German-only, causing the model to learn more efficient thanks to its tokenizer, deal better with shorter queries and generally be more nuanced.
|
295 |
+
- **Updated knowledge and quality data:** The backbone of this model is gbert-large by deepset. With Stage-2 pretraining on German fineweb by occiglot (newest only), up-to-date knowledge is ensured.
|
296 |
- **Flexibility:** Trained with flexible sequence-length and embedding truncation, flexibility is a core feature of the model, while improving on V2-performance.
|
297 |
|
298 |
## Usage:
|
|
|
301 |
from sentence_transformers import SentenceTransformer
|
302 |
|
303 |
|
304 |
+
matryoshka_dim = 1024 # How big your embeddings should be, choose from: 64, 128, 256, 512, 768, 1024
|
305 |
model = SentenceTransformer("aari1995/German_Semantic_V3", trust_remote_code=True, truncate_dim=matryoshka_dim)
|
306 |
|
307 |
# model.truncate_dim = 64 # truncation dimensions can also be changed after loading
|