BGE SITGES CAT

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity
  • Language: ca
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("adriansanz/SITGES-BAAI3")
# Run inference
sentences = [
    "Cal revisar la informació i els terminis de la convocatòria específica de cada procés que trobareu a la Seu electrònica de l'Ajuntament de Sitges.",
    "On es pot trobar la informació sobre els terminis de presentació d'al·legacions en un procés de selecció de personal de l'Ajuntament de Sitges?",
    "Quin és el document que es necessita per acreditar l'any de construcció i l'adequació a la legalitat urbanística d'un immoble?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.1268
cosine_accuracy@3 0.2129
cosine_accuracy@5 0.3086
cosine_accuracy@10 0.4952
cosine_precision@1 0.1268
cosine_precision@3 0.071
cosine_precision@5 0.0617
cosine_precision@10 0.0495
cosine_recall@1 0.1268
cosine_recall@3 0.2129
cosine_recall@5 0.3086
cosine_recall@10 0.4952
cosine_ndcg@10 0.2751
cosine_mrr@10 0.2094
cosine_map@100 0.2368

Information Retrieval

Metric Value
cosine_accuracy@1 0.1196
cosine_accuracy@3 0.2057
cosine_accuracy@5 0.311
cosine_accuracy@10 0.4976
cosine_precision@1 0.1196
cosine_precision@3 0.0686
cosine_precision@5 0.0622
cosine_precision@10 0.0498
cosine_recall@1 0.1196
cosine_recall@3 0.2057
cosine_recall@5 0.311
cosine_recall@10 0.4976
cosine_ndcg@10 0.2725
cosine_mrr@10 0.2052
cosine_map@100 0.2322

Information Retrieval

Metric Value
cosine_accuracy@1 0.1244
cosine_accuracy@3 0.2153
cosine_accuracy@5 0.3301
cosine_accuracy@10 0.5048
cosine_precision@1 0.1244
cosine_precision@3 0.0718
cosine_precision@5 0.066
cosine_precision@10 0.0505
cosine_recall@1 0.1244
cosine_recall@3 0.2153
cosine_recall@5 0.3301
cosine_recall@10 0.5048
cosine_ndcg@10 0.2802
cosine_mrr@10 0.213
cosine_map@100 0.2391

Information Retrieval

Metric Value
cosine_accuracy@1 0.1196
cosine_accuracy@3 0.2321
cosine_accuracy@5 0.3206
cosine_accuracy@10 0.4761
cosine_precision@1 0.1196
cosine_precision@3 0.0774
cosine_precision@5 0.0641
cosine_precision@10 0.0476
cosine_recall@1 0.1196
cosine_recall@3 0.2321
cosine_recall@5 0.3206
cosine_recall@10 0.4761
cosine_ndcg@10 0.269
cosine_mrr@10 0.2064
cosine_map@100 0.2351

Information Retrieval

Metric Value
cosine_accuracy@1 0.1196
cosine_accuracy@3 0.2177
cosine_accuracy@5 0.3254
cosine_accuracy@10 0.5
cosine_precision@1 0.1196
cosine_precision@3 0.0726
cosine_precision@5 0.0651
cosine_precision@10 0.05
cosine_recall@1 0.1196
cosine_recall@3 0.2177
cosine_recall@5 0.3254
cosine_recall@10 0.5
cosine_ndcg@10 0.2755
cosine_mrr@10 0.2081
cosine_map@100 0.2341

Information Retrieval

Metric Value
cosine_accuracy@1 0.1292
cosine_accuracy@3 0.2129
cosine_accuracy@5 0.3206
cosine_accuracy@10 0.4809
cosine_precision@1 0.1292
cosine_precision@3 0.071
cosine_precision@5 0.0641
cosine_precision@10 0.0481
cosine_recall@1 0.1292
cosine_recall@3 0.2129
cosine_recall@5 0.3206
cosine_recall@10 0.4809
cosine_ndcg@10 0.2705
cosine_mrr@10 0.2075
cosine_map@100 0.234

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 6
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 6
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss dim_1024_cosine_map@100 dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.3404 5 3.3256 - - - - - - -
0.6809 10 2.2115 - - - - - - -
0.9532 14 - 1.2963 0.2260 0.2148 0.2144 0.2258 0.2069 0.2252
1.0213 15 1.7921 - - - - - - -
1.3617 20 1.2295 - - - - - - -
1.7021 25 0.9048 - - - - - - -
1.9745 29 - 0.8667 0.2311 0.2267 0.2292 0.2279 0.2121 0.2278
2.0426 30 0.7256 - - - - - - -
2.3830 35 0.5252 - - - - - - -
2.7234 40 0.4648 - - - - - - -
2.9957 44 - 0.692 0.2311 0.2243 0.2332 0.2319 0.2211 0.2354
3.0638 45 0.3518 - - - - - - -
3.4043 50 0.321 - - - - - - -
3.7447 55 0.2923 - - - - - - -
3.9489 58 - 0.6514 0.2343 0.2210 0.2293 0.2338 0.2242 0.2331
4.0851 60 0.2522 - - - - - - -
4.4255 65 0.2445 - - - - - - -
4.7660 70 0.2358 - - - - - - -
4.9702 73 - 0.6481 0.2348 0.2239 0.2252 0.2332 0.2167 0.2298
5.1064 75 0.2301 - - - - - - -
5.4468 80 0.2262 - - - - - - -
5.7191 84 - 0.6460 0.2430 0.2308 0.2343 0.2408 0.2212 0.2378
0.3404 5 0.1585 - - - - - - -
0.6809 10 0.1465 - - - - - - -
0.9532 14 - 0.6325 0.2407 0.2255 0.2328 0.2333 0.2266 0.2429
1.0213 15 0.1411 - - - - - - -
1.3617 20 0.079 - - - - - - -
1.7021 25 0.1159 - - - - - - -
1.9745 29 - 0.6772 0.2361 0.2287 0.2252 0.2325 0.2228 0.2387
2.0426 30 0.0838 - - - - - - -
2.3830 35 0.0647 - - - - - - -
2.7234 40 0.0752 - - - - - - -
2.9957 44 - 0.6668 0.2304 0.2354 0.2304 0.2344 0.2155 0.2321
3.0638 45 0.0706 - - - - - - -
3.4043 50 0.0478 - - - - - - -
3.7447 55 0.0768 - - - - - - -
3.9489 58 - 0.6040 0.2318 0.2293 0.2292 0.2305 0.2165 0.2264
4.0851 60 0.0793 - - - - - - -
4.4255 65 0.0559 - - - - - - -
4.7660 70 0.0654 - - - - - - -
4.9702 73 - 0.6105 0.2328 0.2328 0.2313 0.2364 0.2279 0.2320
5.1064 75 0.0734 - - - - - - -
5.4468 80 0.0616 - - - - - - -
5.7191 84 - 0.6107 0.2368 0.2341 0.2351 0.2391 0.2340 0.2322
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.3
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
9
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for adriansanz/SITGES-BAAI3

Base model

BAAI/bge-m3
Finetuned
(182)
this model

Evaluation results