--- language: - de library_name: sentence-transformers tags: - sentence-transformers - sentence-similarity - feature-extraction - dataset_size:10K thanks to the ALiBi implementation of Jina-Team! - **Matryoshka Embeddings:** The model is trained for embedding sizes from 1024 down to 64, allowing you to store much smaller embeddings with little quality loss. - **License:** Apache 2.0 - **German only:** This model is German-only, causing the model to learn more efficient and deal better with shorter queries. - **Flexibility:** Trained with flexible sequence-length and embedding truncation, flexibility is a core feature of the model, while improving on V2-performance. ## Usage: ```python from sentence_transformers import SentenceTransformer matryoshka_dim = 1024 # How big your embeddings should be, choose from: 64, 128, 256, 512, 1024 model = SentenceTransformer("aari1995/German_Semantic_V3", trust_remote_code=True, truncate_dim=matryoshka_dim) # model.truncate_dim = 64 # truncation dimensions can also be changed after loading # model.max_seq_length = 512 #optionally, set your maximum sequence length lower if your hardware is limited # Run inference sentences = [ 'Eine Flagge weht.', 'Die Flagge bewegte sich in der Luft.', 'Zwei Personen beobachten das Wasser.', ] embeddings = model.encode(sentences) # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) ``` ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** gbert-large (alibi applied) - **Maximum Sequence Length:** 8192 tokens - **Output Dimensionality:** 1024 tokens - **Similarity Function:** Cosine Similarity - **Training Dataset:** - multiple German datasets - **Languages:** de ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: JinaBertModel (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("aari1995/German_Semantic_V3", trust_remote_code=True) # Run inference sentences = [ 'Eine Flagge weht.', 'Die Flagge bewegte sich in der Luft.', 'Zwei Personen beobachten das Wasser.', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 1024] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Semantic Similarity * Dataset: `sts-test-1024` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | Value | |:--------------------|:-----------| | pearson_cosine | 0.8539 | | **spearman_cosine** | **0.8623** | | pearson_manhattan | 0.8555 | | spearman_manhattan | 0.8633 | | pearson_euclidean | 0.8554 | | spearman_euclidean | 0.8631 | | pearson_dot | 0.817 | | spearman_dot | 0.815 | | pearson_max | 0.8555 | | spearman_max | 0.8633 | #### Semantic Similarity * Dataset: `sts-test-768` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | Value | |:--------------------|:-----------| | pearson_cosine | 0.8538 | | **spearman_cosine** | **0.8632** | | pearson_manhattan | 0.8559 | | spearman_manhattan | 0.8638 | | pearson_euclidean | 0.8559 | | spearman_euclidean | 0.8634 | | pearson_dot | 0.8169 | | spearman_dot | 0.8157 | | pearson_max | 0.8559 | | spearman_max | 0.8638 | #### Semantic Similarity * Dataset: `sts-test-512` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | Value | |:--------------------|:-----------| | pearson_cosine | 0.8502 | | **spearman_cosine** | **0.8624** | | pearson_manhattan | 0.8547 | | spearman_manhattan | 0.8629 | | pearson_euclidean | 0.8546 | | spearman_euclidean | 0.8625 | | pearson_dot | 0.8108 | | spearman_dot | 0.8103 | | pearson_max | 0.8547 | | spearman_max | 0.8629 | #### Semantic Similarity * Dataset: `sts-test-256` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | Value | |:--------------------|:-----------| | pearson_cosine | 0.8441 | | **spearman_cosine** | **0.8583** | | pearson_manhattan | 0.8517 | | spearman_manhattan | 0.8592 | | pearson_euclidean | 0.8517 | | spearman_euclidean | 0.8592 | | pearson_dot | 0.7902 | | spearman_dot | 0.7891 | | pearson_max | 0.8517 | | spearman_max | 0.8592 | #### Semantic Similarity * Dataset: `sts-test-128` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | Value | |:--------------------|:-----------| | pearson_cosine | 0.8369 | | **spearman_cosine** | **0.8546** | | pearson_manhattan | 0.8474 | | spearman_manhattan | 0.8547 | | pearson_euclidean | 0.8478 | | spearman_euclidean | 0.855 | | pearson_dot | 0.7733 | | spearman_dot | 0.7721 | | pearson_max | 0.8478 | | spearman_max | 0.855 | #### Semantic Similarity * Dataset: `sts-test-64` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | Value | |:--------------------|:-----------| | pearson_cosine | 0.8282 | | **spearman_cosine** | **0.8507** | | pearson_manhattan | 0.8405 | | spearman_manhattan | 0.8483 | | pearson_euclidean | 0.8426 | | spearman_euclidean | 0.8499 | | pearson_dot | 0.7519 | | spearman_dot | 0.7518 | | pearson_max | 0.8426 | | spearman_max | 0.8507 | ## Training Details * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "ContrastiveLoss", "matryoshka_dims": [ 1024, 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ## License / Credits and Special thanks to: - to [Jina AI](https://huggingface.co/jinaai) for the model architecture, especially their ALiBi implementation - to [deepset](https://huggingface.co/deepset) for gbert-large, which is imho still the greatest German model ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### ContrastiveLoss ```bibtex @inproceedings{hadsell2006dimensionality, author={Hadsell, R. and Chopra, S. and LeCun, Y.}, booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)}, title={Dimensionality Reduction by Learning an Invariant Mapping}, year={2006}, volume={2}, number={}, pages={1735-1742}, doi={10.1109/CVPR.2006.100} } ```