SentenceTransformer based on FacebookAI/xlm-roberta-base

This is a sentence-transformers model finetuned from FacebookAI/xlm-roberta-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: FacebookAI/xlm-roberta-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("slimaneMakh/triplet_CloseHlabel_farLabel_andnegativ-1M-5eps-XLMR_29may")
# Run inference
sentences = [
    'Sales',
    'Revenue',
    'Operating profit',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9988
dot_accuracy 0.0015
manhattan_accuracy 0.9975
euclidean_accuracy 0.9991
max_accuracy 0.9991

Training Details

Training Dataset

Unnamed Dataset

  • Size: 660,643 training samples
  • Columns: anchor_label, pos_hlabel, and neg_hlabel
  • Approximate statistics based on the first 1000 samples:
    anchor_label pos_hlabel neg_hlabel
    type string string string
    details
    • min: 3 tokens
    • mean: 11.86 tokens
    • max: 39 tokens
    • min: 3 tokens
    • mean: 9.06 tokens
    • max: 32 tokens
    • min: 3 tokens
    • mean: 7.99 tokens
    • max: 25 tokens
  • Samples:
    anchor_label pos_hlabel neg_hlabel
    Basic earnings (loss) per share Tavakasum kahjum aktsia kohta II Kapital z nadwyzki wartosci emisyjnej ponad wartosc nominalna
    Comprehensive income Suma dochodow calkowitych dont Marques
    Cash and cash equivalents Cash and cash equivalents Cars incl prepayments
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 283,133 evaluation samples
  • Columns: anchor_label, pos_hlabel, and neg_hlabel
  • Approximate statistics based on the first 1000 samples:
    anchor_label pos_hlabel neg_hlabel
    type string string string
    details
    • min: 3 tokens
    • mean: 11.78 tokens
    • max: 37 tokens
    • min: 3 tokens
    • mean: 9.22 tokens
    • max: 39 tokens
    • min: 3 tokens
    • mean: 8.12 tokens
    • max: 29 tokens
  • Samples:
    anchor_label pos_hlabel neg_hlabel
    Deferred tax assets Deferred tax assets Immateriella tillgangar
    Equity EGET KAPITAL inklusive periodens resultat Materials
    Adjustments for decrease (increase) in other operating receivables Okning av ovriga rorelsetillgangar Rorelseresultat
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss max_accuracy
0.0121 500 3.7705 - -
0.0242 1000 1.4084 - -
0.0363 1500 0.7062 - -
0.0484 2000 0.5236 - -
0.0605 2500 0.4348 - -
0.0727 3000 0.3657 - -
0.0848 3500 0.3657 - -
0.0969 4000 0.2952 - -
0.1090 4500 0.3805 - -
0.1211 5000 0.3255 - -
0.1332 5500 0.2621 - -
0.1453 6000 0.2377 - -
0.1574 6500 0.2139 - -
0.1695 7000 0.2085 - -
0.1816 7500 0.1809 - -
0.1937 8000 0.1711 - -
0.2059 8500 0.1608 - -
0.2180 9000 0.1808 - -
0.2301 9500 0.1553 - -
0.2422 10000 0.1417 - -
0.2543 10500 0.1329 - -
0.2664 11000 0.1689 - -
0.2785 11500 0.1292 - -
0.2906 12000 0.1181 - -
0.3027 12500 0.1223 - -
0.3148 13000 0.129 - -
0.3269 13500 0.0911 - -
0.3391 14000 0.113 - -
0.3512 14500 0.0955 - -
0.3633 15000 0.108 - -
0.3754 15500 0.094 - -
0.3875 16000 0.0947 - -
0.3996 16500 0.0748 - -
0.4117 17000 0.0699 - -
0.4238 17500 0.0707 - -
0.4359 18000 0.0768 - -
0.4480 18500 0.0805 - -
0.4601 19000 0.0705 - -
0.4723 19500 0.069 - -
0.4844 20000 0.072 - -
0.4965 20500 0.0669 - -
0.5086 21000 0.066 - -
0.5207 21500 0.0624 - -
0.5328 22000 0.0687 - -
0.5449 22500 0.076 - -
0.5570 23000 0.0563 - -
0.5691 23500 0.0594 - -
0.5812 24000 0.0524 - -
0.5933 24500 0.0528 - -
0.6055 25000 0.0448 - -
0.6176 25500 0.041 - -
0.6297 26000 0.0397 - -
0.6418 26500 0.0489 - -
0.6539 27000 0.0595 - -
0.6660 27500 0.034 - -
0.6781 28000 0.0569 - -
0.6902 28500 0.0467 - -
0.7023 29000 0.0323 - -
0.7144 29500 0.0428 - -
0.7266 30000 0.0344 - -
0.7387 30500 0.029 - -
0.7508 31000 0.0418 - -
0.7629 31500 0.0285 - -
0.7750 32000 0.0425 - -
0.7871 32500 0.0266 - -
0.7992 33000 0.0325 - -
0.8113 33500 0.0215 - -
0.8234 34000 0.0316 - -
0.8355 34500 0.0286 - -
0.8476 35000 0.0285 - -
0.8598 35500 0.0284 - -
0.8719 36000 0.0147 - -
0.8840 36500 0.0217 - -
0.8961 37000 0.0311 - -
0.9082 37500 0.0202 - -
0.9203 38000 0.0236 - -
0.9324 38500 0.0201 - -
0.9445 39000 0.0246 - -
0.9566 39500 0.0177 - -
0.9687 40000 0.0173 - -
0.9808 40500 0.0202 - -
0.9930 41000 0.017 - -
1.0 41291 - 0.0140 0.9991

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.0.0
  • Transformers: 4.39.3
  • PyTorch: 2.1.2
  • Accelerate: 0.28.0
  • Datasets: 2.18.0
  • Tokenizers: 0.15.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification}, 
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
11
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for slimaneMakh/triplet_CloseHlabel_farLabel_andnegativ-1M-5eps-XLMR_29may

Finetuned
(2688)
this model

Evaluation results