Edit model card

German BERT large paraphrase cosine

This is a sentence-transformers model. It maps sentences & paragraphs (text) into a 1024 dimensional dense vector space. The model is intended to be used together with SetFit to improve German few-shot text classification. It has a sibling model called deutsche-telekom/gbert-large-paraphrase-euclidean.

This model is based on deepset/gbert-large. Many thanks to deepset!

Loss Function
We have used MultipleNegativesRankingLoss with cosine similarity as the loss function.

Training Data
The model is trained on a carefully filtered dataset of deutsche-telekom/ger-backtrans-paraphrase. We deleted the following pairs of sentences:

  • min_char_len less than 15
  • jaccard_similarity greater than 0.3
  • de_token_count greater than 30
  • en_de_token_count greater than 30
  • cos_sim less than 0.85


  • learning_rate: 8.345726930229726e-06
  • num_epochs: 7
  • train_batch_size: 57
  • num_gpu: 1

Evaluation Results

We use the NLU Few-shot Benchmark - English and German dataset to evaluate this model in a German few-shot scenario.

Qualitative results


Copyright (c) 2023 Philip May, Deutsche Telekom AG
Copyright (c) 2022 deepset GmbH

Licensed under the MIT License (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License by reviewing the file LICENSE in the repository.

Downloads last month

Dataset used to train deutsche-telekom/gbert-large-paraphrase-cosine