SentenceTransformer

This is a sentence-transformers model trained on the corpus dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • corpus

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("whitemouse84/ModernBERT-base-en-ru-v1")
# Run inference
sentences = [
    'Transparency is absolutely critical to this.',
    'Прозрачность - абсолютно критична в этом процессе.',
    'Мы покупаем его нашим детям.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Knowledge Distillation

  • Datasets: small_content and big_content
  • Evaluated with MSEEvaluator
Metric small_content big_content
negative_mse -4.3569 -3.5414

Translation

Metric small_content big_content
src2trg_accuracy 0.7375 0.8285
trg2src_accuracy 0.665 0.668
mean_accuracy 0.7013 0.7483

Encodechka

Model STS PI NLI SA TI IA IC ICX
ModernBERT-base-en-ru-v1 0.602 0.521 0.355 0.722 0.892 0.704 0.747 0.591
ModernBERT-base 0.498 0.239 0.358 0.643 0.786 0.623 0.593 0.104
EuroBERT-210m 0.619 0.452 0.369 0.702 0.875 0.703 0.647 0.192
xlm-roberta-base 0.552 0.439 0.362 0.752 0.940 0.768 0.695 0.520

Training Details

Training Dataset

corpus

  • Dataset: corpus
  • Size: 2,000,000 training samples
  • Columns: english, non_english, and label
  • Approximate statistics based on the first 1000 samples:
    english non_english label
    type string string list
    details
    • min: 4 tokens
    • mean: 29.26 tokens
    • max: 133 tokens
    • min: 7 tokens
    • mean: 71.46 tokens
    • max: 285 tokens
    • size: 768 elements
  • Samples:
    english non_english label
    Hence it can be said that Voit is a well-satisfied customer, and completely convinced of the potential offered by Voortman machines for his firm. В конечном итоге можно утверждать, что компания Voit довольна своим выбором, ведь она имела возможность убедиться в качественных характеристиках оборудования Voortman. [0.1702279895544052, -0.6711388826370239, -0.5062062740325928, 0.14078450202941895, 0.15188495814800262, ...]
    We want to feel good, we want to be happy, in fact happiness is our birthright. Мы хотим чувствовать себя хорошо, хотим быть счастливы. [0.556108295917511, -0.42819586396217346, -0.25372204184532166, 0.099883534014225, 0.7299532294273376, ...]
    In Germany, Arcandor - a major holding company in the mail order, retail and tourism industries that reported €21 billion in 2007 sales - threatens to become the first victim of tighter credit terms. В Германии Arcandor - ключевая холдинговая компания в сфере посылочной и розничной торговли, а также индустрии туризма, в финансовых отчетах которой за 2007 год значился торговый оборот в размере €21 миллиардов - грозит стать первой жертвой ужесточения условий кредитования. [-0.27140647172927856, -0.5173773169517517, -0.6571329236030579, 0.21765929460525513, -0.01978394016623497, ...]
  • Loss: MSELoss

Evaluation Datasets

small_content

  • Dataset: small_content
  • Size: 2,000 evaluation samples
  • Columns: english, non_english, and label
  • Approximate statistics based on the first 1000 samples:
    english non_english label
    type string string list
    details
    • min: 4 tokens
    • mean: 24.13 tokens
    • max: 252 tokens
    • min: 5 tokens
    • mean: 53.83 tokens
    • max: 406 tokens
    • size: 768 elements
  • Samples:
    english non_english label
    Thank you so much, Chris. Спасибо, Крис. [1.0408389568328857, 0.3253674805164337, -0.12651680409908295, 0.45153331756591797, 0.4052223563194275, ...]
    And it's truly a great honor to have the opportunity to come to this stage twice; I'm extremely grateful. Это огромная честь, получить возможность выйти на эту сцену дважды. Я неимоверно благодарен. [0.6990637183189392, -0.4462655782699585, -0.5292129516601562, 0.23709823191165924, 0.32307693362236023, ...]
    I have been blown away by this conference, and I want to thank all of you for the many nice comments about what I had to say the other night. Я в восторге от этой конференции, и я хочу поблагодарить вас всех за благожелательные отзывы о моем позавчерашнем выступлении. [0.8470447063446045, -0.17461800575256348, -0.7178670167922974, 0.6488378047943115, 0.6101466417312622, ...]
  • Loss: MSELoss

big_content

  • Dataset: big_content
  • Size: 2,000 evaluation samples
  • Columns: english, non_english, and label
  • Approximate statistics based on the first 1000 samples:
    english non_english label
    type string string list
    details
    • min: 6 tokens
    • mean: 43.84 tokens
    • max: 141 tokens
    • min: 10 tokens
    • mean: 107.9 tokens
    • max: 411 tokens
    • size: 768 elements
  • Samples:
    english non_english label
    India has recorded a surge in COVID-19 cases in the past weeks, with over 45,000 new cases detected every day since July 23. Индия зафиксировала резкий всплеск случаев заражения COVID-19 за последние недели, с 23 июля каждый день выявляется более 45 000 новых случаев. [-0.12528948485851288, -0.49428656697273254, -0.07556094229221344, 0.8069225549697876, 0.20946118235588074, ...]
    A bloom the Red Tide extends approximately 130 miles of coastline from northern Pinellas to southern Lee counties. Цветение Красного Прилива простирается примерно на 130 миль дволь береговой линии от Пинеллас на севере до округа Ли на юге. [0.027262285351753235, -0.4401558041572571, -0.3353440463542938, 0.11166133731603622, -0.2294958084821701, ...]
    Among those affected by the new rules is Transport Secretary Grant Shapps, who began his holiday in Spain on Saturday. Среди тех, кого затронули новые правила, оказался министр транспорта Грант Шэппс, у которого в субботу начался отпуск в Испании. [0.1868007630109787, -0.18781621754169464, -0.48890581727027893, 0.328614205121994, 0.36041054129600525, ...]
  • Loss: MSELoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Framework Versions

  • Python: 3.13.2
  • Sentence Transformers: 3.4.1
  • Transformers: 4.49.0
  • PyTorch: 2.6.0+cu126
  • Accelerate: 1.4.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}
Downloads last month
11
Safetensors
Model size
149M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for whitemouse84/ModernBERT-base-en-ru-v1

Finetuned
(440)
this model

Evaluation results