metadata
language:
- de
library_name: sentence-transformers
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dataset_size:10K<n<100K
- loss:MatryoshkaLoss
- loss:ContrastiveLoss
base_model: aari1995/gbert-large-alibi
metrics:
- pearson_cosine
- spearman_cosine
- pearson_manhattan
- spearman_manhattan
- pearson_euclidean
- spearman_euclidean
- pearson_dot
- spearman_dot
- pearson_max
- spearman_max
widget:
- source_sentence: Das Tor ist gelb.
sentences:
- Das Tor ist blau.
- Ein Mann mit seinem Hund am Strand.
- Die Menschen sitzen auf Bänken.
- source_sentence: Das Tor ist blau.
sentences:
- Ein blaues Moped parkt auf dem Bürgersteig.
- Drei Hunde spielen im weißen Schnee.
- Bombenanschläge töten 19 Menschen im Irak
- source_sentence: Ein Mann übt Boxen
sentences:
- Ein Fußballspieler versucht ein Tackling.
- 1 Getötet bei Protest in Bangladesch
- Das Mädchen sang in ein Mikrofon.
- source_sentence: Drei Männer tanzen.
sentences:
- Ein Mann tanzt.
- Ein Mann arbeitet an seinem Laptop.
- Das Mädchen sang in ein Mikrofon.
- source_sentence: Eine Flagge weht.
sentences:
- Die Flagge bewegte sich in der Luft.
- Zwei Personen beobachten das Wasser.
- Zwei Frauen sitzen in einem Cafe.
pipeline_tag: sentence-similarity
model-index:
- name: SentenceTransformer based on aari1995/gbert-large-nli_mix
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 1024
type: sts-test-1024
metrics:
- type: pearson_cosine
value: 0.8538749625112824
name: Pearson Cosine
- type: spearman_cosine
value: 0.8622934726599119
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8554617861095041
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8632850500504865
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8554205957277228
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8630779166725503
name: Spearman Euclidean
- type: pearson_dot
value: 0.8170146846171837
name: Pearson Dot
- type: spearman_dot
value: 0.8149857685956332
name: Spearman Dot
- type: pearson_max
value: 0.8554617861095041
name: Pearson Max
- type: spearman_max
value: 0.8632850500504865
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 768
type: sts-test-768
metrics:
- type: pearson_cosine
value: 0.853820621972726
name: Pearson Cosine
- type: spearman_cosine
value: 0.863198271488271
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8558709278385018
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8637532036004547
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8558597695346744
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8634247094122574
name: Spearman Euclidean
- type: pearson_dot
value: 0.8169163431962185
name: Pearson Dot
- type: spearman_dot
value: 0.8156867907361973
name: Spearman Dot
- type: pearson_max
value: 0.8558709278385018
name: Pearson Max
- type: spearman_max
value: 0.8637532036004547
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 512
type: sts-test-512
metrics:
- type: pearson_cosine
value: 0.8502336569709972
name: Pearson Cosine
- type: spearman_cosine
value: 0.8623838162450902
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8547121881183612
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8628698143219098
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8546114371189246
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8625109910600326
name: Spearman Euclidean
- type: pearson_dot
value: 0.8108392647310044
name: Pearson Dot
- type: spearman_dot
value: 0.8103261097232485
name: Spearman Dot
- type: pearson_max
value: 0.8547121881183612
name: Pearson Max
- type: spearman_max
value: 0.8628698143219098
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 256
type: sts-test-256
metrics:
- type: pearson_cosine
value: 0.8441242786553879
name: Pearson Cosine
- type: spearman_cosine
value: 0.8582717489671877
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8517415030362573
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8591688553092182
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8516965854845419
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8591770194196562
name: Spearman Euclidean
- type: pearson_dot
value: 0.7901870400809775
name: Pearson Dot
- type: spearman_dot
value: 0.7891397281321177
name: Spearman Dot
- type: pearson_max
value: 0.8517415030362573
name: Pearson Max
- type: spearman_max
value: 0.8591770194196562
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 128
type: sts-test-128
metrics:
- type: pearson_cosine
value: 0.8369352495821198
name: Pearson Cosine
- type: spearman_cosine
value: 0.8545806562301762
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8474289413580527
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8546935424655524
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8478267316251253
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8550464936365929
name: Spearman Euclidean
- type: pearson_dot
value: 0.7732663297266509
name: Pearson Dot
- type: spearman_dot
value: 0.7720532782903432
name: Spearman Dot
- type: pearson_max
value: 0.8478267316251253
name: Pearson Max
- type: spearman_max
value: 0.8550464936365929
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 64
type: sts-test-64
metrics:
- type: pearson_cosine
value: 0.8282288301025145
name: Pearson Cosine
- type: spearman_cosine
value: 0.8507215646125454
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8404915813802649
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8482910175231816
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8425986040609018
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8498681513437906
name: Spearman Euclidean
- type: pearson_dot
value: 0.7518854418344252
name: Pearson Dot
- type: spearman_dot
value: 0.7518133373839283
name: Spearman Dot
- type: pearson_max
value: 0.8425986040609018
name: Pearson Max
- type: spearman_max
value: 0.8507215646125454
name: Spearman Max
license: apache-2.0
German Semantic V3
The successor of German_Semantic_STS_V2 is here!
Major updates and USPs:
- Sequence length: 8192, (16 times more than V2 and other models) => thanks to the ALiBi implementation of Jina-Team!
- Matryoshka Embeddings: The model is trained for embedding sizes from 1024 down to 64, allowing you to store much smaller embeddings with little quality loss.
- License: Apache 2.0
- German only: This model is German-only, causing the model to learn more efficient and deal better with shorter queries.
- Flexibility: Trained with flexible sequence-length and embedding truncation, flexibility is a core feature of the model, while improving on V2-performance.
Usage:
from sentence_transformers import SentenceTransformer
matryoshka_dim = 1024
model = SentenceTransformer("aari1995/German_Semantic_V3", trust_remote_code=True, truncate_dim=matryoshka_dim)
sentences = [
'Eine Flagge weht.',
'Die Flagge bewegte sich in der Luft.',
'Zwei Personen beobachten das Wasser.',
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: gbert-large (alibi applied)
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 1024 tokens
- Similarity Function: Cosine Similarity
- Training Dataset:
- Languages: de
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: JinaBertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("aari1995/German_Semantic_V3", trust_remote_code=True)
sentences = [
'Eine Flagge weht.',
'Die Flagge bewegte sich in der Luft.',
'Zwei Personen beobachten das Wasser.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
Evaluation
Metrics
Semantic Similarity
Metric |
Value |
pearson_cosine |
0.8539 |
spearman_cosine |
0.8623 |
pearson_manhattan |
0.8555 |
spearman_manhattan |
0.8633 |
pearson_euclidean |
0.8554 |
spearman_euclidean |
0.8631 |
pearson_dot |
0.817 |
spearman_dot |
0.815 |
pearson_max |
0.8555 |
spearman_max |
0.8633 |
Semantic Similarity
Metric |
Value |
pearson_cosine |
0.8538 |
spearman_cosine |
0.8632 |
pearson_manhattan |
0.8559 |
spearman_manhattan |
0.8638 |
pearson_euclidean |
0.8559 |
spearman_euclidean |
0.8634 |
pearson_dot |
0.8169 |
spearman_dot |
0.8157 |
pearson_max |
0.8559 |
spearman_max |
0.8638 |
Semantic Similarity
Metric |
Value |
pearson_cosine |
0.8502 |
spearman_cosine |
0.8624 |
pearson_manhattan |
0.8547 |
spearman_manhattan |
0.8629 |
pearson_euclidean |
0.8546 |
spearman_euclidean |
0.8625 |
pearson_dot |
0.8108 |
spearman_dot |
0.8103 |
pearson_max |
0.8547 |
spearman_max |
0.8629 |
Semantic Similarity
Metric |
Value |
pearson_cosine |
0.8441 |
spearman_cosine |
0.8583 |
pearson_manhattan |
0.8517 |
spearman_manhattan |
0.8592 |
pearson_euclidean |
0.8517 |
spearman_euclidean |
0.8592 |
pearson_dot |
0.7902 |
spearman_dot |
0.7891 |
pearson_max |
0.8517 |
spearman_max |
0.8592 |
Semantic Similarity
Metric |
Value |
pearson_cosine |
0.8369 |
spearman_cosine |
0.8546 |
pearson_manhattan |
0.8474 |
spearman_manhattan |
0.8547 |
pearson_euclidean |
0.8478 |
spearman_euclidean |
0.855 |
pearson_dot |
0.7733 |
spearman_dot |
0.7721 |
pearson_max |
0.8478 |
spearman_max |
0.855 |
Semantic Similarity
Metric |
Value |
pearson_cosine |
0.8282 |
spearman_cosine |
0.8507 |
pearson_manhattan |
0.8405 |
spearman_manhattan |
0.8483 |
pearson_euclidean |
0.8426 |
spearman_euclidean |
0.8499 |
pearson_dot |
0.7519 |
spearman_dot |
0.7518 |
pearson_max |
0.8426 |
spearman_max |
0.8507 |
Training Details
- Loss:
MatryoshkaLoss
with these parameters:{
"loss": "ContrastiveLoss",
"matryoshka_dims": [
1024,
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
License / Credits and Special thanks to:
- to Jina AI for the model architecture, especially their ALiBi implementation
- to deepset for gbert-large, which is imho still the greatest German model
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
ContrastiveLoss
@inproceedings{hadsell2006dimensionality,
author={Hadsell, R. and Chopra, S. and LeCun, Y.},
booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)},
title={Dimensionality Reduction by Learning an Invariant Mapping},
year={2006},
volume={2},
number={},
pages={1735-1742},
doi={10.1109/CVPR.2006.100}
}