Back to all models
fill-mask mask_token: [MASK]
Query this model
🔥 This model is currently loaded and running on the Inference API. ⚠️ This model could not be loaded by the inference API. ⚠️ This model can be loaded on the Inference API on-demand.
JSON Output
API endpoint
								curl -X POST \
-H "Authorization: Bearer YOUR_ORG_OR_USER_API_TOKEN" \
-H "Content-Type: application/json" \
-d '"json encoded string"' \
Share Copied link to clipboard

Monthly model downloads

T-Systems-onsite/bert-german-dbmdz-uncased-sentence-stsb T-Systems-onsite/bert-german-dbmdz-uncased-sentence-stsb
last 30 days



Contributed by

T-Systems on site services GmbH company
1 team member · 1 model

How to use this model directly from the 🤗/transformers library:

Copy to clipboard
from transformers import AutoTokenizer, AutoModelWithLMHead tokenizer = AutoTokenizer.from_pretrained("T-Systems-onsite/bert-german-dbmdz-uncased-sentence-stsb") model = AutoModelWithLMHead.from_pretrained("T-Systems-onsite/bert-german-dbmdz-uncased-sentence-stsb")


How to use

The usage description above - provided by Hugging Face - is wrong! Please use this:

Install the sentence-transformers package. See here:

from sentence_transformers import models
from sentence_transformers import SentenceTransformer

# load BERT model from Hugging Face
word_embedding_model = models.Transformer(

# Apply mean pooling to get one fixed sized sentence vector
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(),

# join BERT model and pooling to get the sentence transformer
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])

Model description

This is a German sentence embedding trained on the German STSbenchmark Dataset. It was trained from Philip May and open-sourced by T-Systems-onsite.The base language model is the dbmdz/bert-base-german-uncased from Bayerische Staatsbibliothek .

Intended uses

Sentence-BERT (SBERT) is a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically mean-ingful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT.

Source: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Training procedure

We did an automatic hyperprameter optimization with Optuna and found the following hyperprameters:

  • batch_size = 5
  • num_epochs = 11
  • lr = 2.637549780860126e-05
  • eps = 5.0696075038683e-06
  • weight_decay = 0.02817210102940054
  • warmup_steps = 27.342745941760147 % of total steps

The final model was trained on the combination of all three datasets: sts_de_dev.csv, sts_de_test.csv and sts_de_train.csv