Edit model card

SentenceTransformer based on nomic-ai/nomic-embed-text-v1.5

This is a sentence-transformers model finetuned from nomic-ai/nomic-embed-text-v1.5 on the Mollel/swahili-n_li-triplet-swh-eng dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/nomic-embed-text-v1.5
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • Mollel/swahili-n_li-triplet-swh-eng

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NomicBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Mollel/MultiLinguSwahili-nomic-embed-text-v1.5-nli-matryoshka")
# Run inference
sentences = [
    'Mwanamume na mwanamke wachanga waliovaa mikoba wanaweka au kuondoa kitu kutoka kwenye mti mweupe wa zamani, huku watu wengine wamesimama au wameketi nyuma.',
    'mwanamume na mwanamke wenye mikoba',
    'tai huruka',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.6945
spearman_cosine 0.6872
pearson_manhattan 0.7086
spearman_manhattan 0.7136
pearson_euclidean 0.7084
spearman_euclidean 0.7128
pearson_dot 0.4819
spearman_dot 0.4659
pearson_max 0.7086
spearman_max 0.7136

Semantic Similarity

Metric Value
pearson_cosine 0.6926
spearman_cosine 0.6859
pearson_manhattan 0.7087
spearman_manhattan 0.7128
pearson_euclidean 0.7089
spearman_euclidean 0.7124
pearson_dot 0.4684
spearman_dot 0.4526
pearson_max 0.7089
spearman_max 0.7128

Semantic Similarity

Metric Value
pearson_cosine 0.6877
spearman_cosine 0.6815
pearson_manhattan 0.7084
spearman_manhattan 0.7098
pearson_euclidean 0.7094
spearman_euclidean 0.7104
pearson_dot 0.4439
spearman_dot 0.4255
pearson_max 0.7094
spearman_max 0.7104

Semantic Similarity

Metric Value
pearson_cosine 0.6709
spearman_cosine 0.667
pearson_manhattan 0.7042
spearman_manhattan 0.7001
pearson_euclidean 0.7055
spearman_euclidean 0.7023
pearson_dot 0.3786
spearman_dot 0.3593
pearson_max 0.7055
spearman_max 0.7023

Semantic Similarity

Metric Value
pearson_cosine 0.6534
spearman_cosine 0.6524
pearson_manhattan 0.692
spearman_manhattan 0.6857
pearson_euclidean 0.695
spearman_euclidean 0.6899
pearson_dot 0.335
spearman_dot 0.3097
pearson_max 0.695
spearman_max 0.6899

Training Details

Training Dataset

Mollel/swahili-n_li-triplet-swh-eng

  • Dataset: Mollel/swahili-n_li-triplet-swh-eng
  • Size: 1,115,700 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 7 tokens
    • mean: 15.18 tokens
    • max: 80 tokens
    • min: 5 tokens
    • mean: 18.53 tokens
    • max: 52 tokens
    • min: 5 tokens
    • mean: 17.8 tokens
    • max: 53 tokens
  • Samples:
    anchor positive negative
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. A person is at a diner, ordering an omelette.
    Mtu aliyepanda farasi anaruka juu ya ndege iliyovunjika. Mtu yuko nje, juu ya farasi. Mtu yuko kwenye mkahawa, akiagiza omelette.
    Children smiling and waving at camera There are children present The kids are frowning
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

Mollel/swahili-n_li-triplet-swh-eng

  • Dataset: Mollel/swahili-n_li-triplet-swh-eng
  • Size: 13,168 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 26.43 tokens
    • max: 94 tokens
    • min: 5 tokens
    • mean: 13.37 tokens
    • max: 65 tokens
    • min: 5 tokens
    • mean: 14.7 tokens
    • max: 54 tokens
  • Samples:
    anchor positive negative
    Two women are embracing while holding to go packages. Two woman are holding packages. The men are fighting outside a deli.
    Wanawake wawili wanakumbatiana huku wakishikilia vifurushi vya kwenda. Wanawake wawili wanashikilia vifurushi. Wanaume hao wanapigana nje ya duka la vyakula vitamu.
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. Two kids in numbered jerseys wash their hands. Two kids in jackets walk to school.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 24
  • per_device_eval_batch_size: 24
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 24
  • per_device_eval_batch_size: 24
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss sts-test-128_spearman_cosine sts-test-256_spearman_cosine sts-test-512_spearman_cosine sts-test-64_spearman_cosine sts-test-768_spearman_cosine
0.0043 100 10.0627 - - - - -
0.0086 200 8.2355 - - - - -
0.0129 300 6.7233 - - - - -
0.0172 400 6.5832 - - - - -
0.0215 500 6.7512 - - - - -
0.0258 600 6.7634 - - - - -
0.0301 700 6.5592 - - - - -
0.0344 800 5.0689 - - - - -
0.0387 900 4.7079 - - - - -
0.0430 1000 4.6359 - - - - -
0.0473 1100 4.4513 - - - - -
0.0516 1200 4.2328 - - - - -
0.0559 1300 3.7454 - - - - -
0.0602 1400 3.9198 - - - - -
0.0645 1500 4.0727 - - - - -
0.0688 1600 3.8923 - - - - -
0.0731 1700 3.8137 - - - - -
0.0774 1800 4.1512 - - - - -
0.0817 1900 4.1304 - - - - -
0.0860 2000 4.0195 - - - - -
0.0903 2100 3.6836 - - - - -
0.0946 2200 2.9968 - - - - -
0.0990 2300 2.8909 - - - - -
0.1033 2400 3.0884 - - - - -
0.1076 2500 3.3081 - - - - -
0.1119 2600 3.6266 - - - - -
0.1162 2700 4.3754 - - - - -
0.1205 2800 4.0218 - - - - -
0.1248 2900 3.7167 - - - - -
0.1291 3000 3.4815 - - - - -
0.1334 3100 3.6446 - - - - -
0.1377 3200 3.44 - - - - -
0.1420 3300 3.6725 - - - - -
0.1463 3400 3.4699 - - - - -
0.1506 3500 3.076 - - - - -
0.1549 3600 3.1179 - - - - -
0.1592 3700 3.1704 - - - - -
0.1635 3800 3.4614 - - - - -
0.1678 3900 4.1157 - - - - -
0.1721 4000 4.1584 - - - - -
0.1764 4100 4.5602 - - - - -
0.1807 4200 3.6875 - - - - -
0.1850 4300 4.1521 - - - - -
0.1893 4400 3.5475 - - - - -
0.1936 4500 3.4036 - - - - -
0.1979 4600 3.0564 - - - - -
0.2022 4700 3.7761 - - - - -
0.2065 4800 3.6857 - - - - -
0.2108 4900 3.3534 - - - - -
0.2151 5000 4.1137 - - - - -
0.2194 5100 3.5239 - - - - -
0.2237 5200 4.1297 - - - - -
0.2280 5300 3.5339 - - - - -
0.2323 5400 3.9294 - - - - -
0.2366 5500 3.717 - - - - -
0.2409 5600 3.3346 - - - - -
0.2452 5700 4.0495 - - - - -
0.2495 5800 3.7869 - - - - -
0.2538 5900 3.9533 - - - - -
0.2581 6000 4.1135 - - - - -
0.2624 6100 3.6655 - - - - -
0.2667 6200 3.9111 - - - - -
0.2710 6300 3.8582 - - - - -
0.2753 6400 3.7712 - - - - -
0.2796 6500 3.6536 - - - - -
0.2839 6600 3.4516 - - - - -
0.2882 6700 3.7151 - - - - -
0.2925 6800 3.7659 - - - - -
0.2969 6900 3.3159 - - - - -
0.3012 7000 3.5753 - - - - -
0.3055 7100 4.2095 - - - - -
0.3098 7200 3.718 - - - - -
0.3141 7300 4.0709 - - - - -
0.3184 7400 3.8079 - - - - -
0.3227 7500 3.3735 - - - - -
0.3270 7600 3.7303 - - - - -
0.3313 7700 3.2693 - - - - -
0.3356 7800 3.6564 - - - - -
0.3399 7900 3.6702 - - - - -
0.3442 8000 3.7274 - - - - -
0.3485 8100 3.8536 - - - - -
0.3528 8200 3.9516 - - - - -
0.3571 8300 3.7351 - - - - -
0.3614 8400 3.649 - - - - -
0.3657 8500 3.5913 - - - - -
0.3700 8600 3.7733 - - - - -
0.3743 8700 3.6359 - - - - -
0.3786 8800 4.2983 - - - - -
0.3829 8900 3.6692 - - - - -
0.3872 9000 3.7309 - - - - -
0.3915 9100 3.8886 - - - - -
0.3958 9200 3.8999 - - - - -
0.4001 9300 3.5528 - - - - -
0.4044 9400 3.6309 - - - - -
0.4087 9500 4.2475 - - - - -
0.4130 9600 3.793 - - - - -
0.4173 9700 3.6575 - - - - -
0.4216 9800 3.84 - - - - -
0.4259 9900 3.3721 - - - - -
0.4302 10000 4.3743 - - - - -
0.4345 10100 3.5054 - - - - -
0.4388 10200 3.54 - - - - -
0.4431 10300 3.6197 - - - - -
0.4474 10400 3.7567 - - - - -
0.4517 10500 3.9814 - - - - -
0.4560 10600 3.6277 - - - - -
0.4603 10700 3.5071 - - - - -
0.4646 10800 3.8348 - - - - -
0.4689 10900 3.8674 - - - - -
0.4732 11000 3.0325 - - - - -
0.4775 11100 3.7262 - - - - -
0.4818 11200 3.6921 - - - - -
0.4861 11300 3.4946 - - - - -
0.4904 11400 3.7541 - - - - -
0.4948 11500 3.6751 - - - - -
0.4991 11600 3.8765 - - - - -
0.5034 11700 3.5058 - - - - -
0.5077 11800 3.5135 - - - - -
0.5120 11900 3.8052 - - - - -
0.5163 12000 3.3015 - - - - -
0.5206 12100 3.5389 - - - - -
0.5249 12200 3.5226 - - - - -
0.5292 12300 3.6715 - - - - -
0.5335 12400 3.2256 - - - - -
0.5378 12500 3.3447 - - - - -
0.5421 12600 3.6315 - - - - -
0.5464 12700 3.8674 - - - - -
0.5507 12800 3.4066 - - - - -
0.5550 12900 3.7356 - - - - -
0.5593 13000 3.5742 - - - - -
0.5636 13100 3.7676 - - - - -
0.5679 13200 3.7907 - - - - -
0.5722 13300 3.8089 - - - - -
0.5765 13400 3.4742 - - - - -
0.5808 13500 3.6536 - - - - -
0.5851 13600 3.7736 - - - - -
0.5894 13700 3.9072 - - - - -
0.5937 13800 3.7386 - - - - -
0.5980 13900 3.3387 - - - - -
0.6023 14000 3.5509 - - - - -
0.6066 14100 3.7056 - - - - -
0.6109 14200 3.7283 - - - - -
0.6152 14300 3.7301 - - - - -
0.6195 14400 3.8027 - - - - -
0.6238 14500 3.5606 - - - - -
0.6281 14600 3.9467 - - - - -
0.6324 14700 3.3394 - - - - -
0.6367 14800 4.1254 - - - - -
0.6410 14900 3.7121 - - - - -
0.6453 15000 3.9167 - - - - -
0.6496 15100 3.8084 - - - - -
0.6539 15200 3.7794 - - - - -
0.6582 15300 3.7664 - - - - -
0.6625 15400 3.4378 - - - - -
0.6668 15500 3.6632 - - - - -
0.6711 15600 3.8493 - - - - -
0.6754 15700 4.1475 - - - - -
0.6797 15800 3.5782 - - - - -
0.6840 15900 3.4341 - - - - -
0.6883 16000 3.3295 - - - - -
0.6927 16100 3.8165 - - - - -
0.6970 16200 3.9702 - - - - -
0.7013 16300 3.6555 - - - - -
0.7056 16400 3.6946 - - - - -
0.7099 16500 3.8027 - - - - -
0.7142 16600 3.4523 - - - - -
0.7185 16700 3.461 - - - - -
0.7228 16800 3.4403 - - - - -
0.7271 16900 3.6398 - - - - -
0.7314 17000 3.8443 - - - - -
0.7357 17100 3.6012 - - - - -
0.7400 17200 3.6645 - - - - -
0.7443 17300 3.4899 - - - - -
0.7486 17400 3.7186 - - - - -
0.7529 17500 3.6199 - - - - -
0.7572 17600 4.4274 - - - - -
0.7615 17700 4.0262 - - - - -
0.7658 17800 3.9325 - - - - -
0.7701 17900 3.6338 - - - - -
0.7744 18000 3.6136 - - - - -
0.7787 18100 3.4514 - - - - -
0.7830 18200 3.4427 - - - - -
0.7873 18300 3.3601 - - - - -
0.7916 18400 3.313 - - - - -
0.7959 18500 3.4062 - - - - -
0.8002 18600 3.098 - - - - -
0.8045 18700 3.183 - - - - -
0.8088 18800 3.1482 - - - - -
0.8131 18900 3.0122 - - - - -
0.8174 19000 3.0828 - - - - -
0.8217 19100 3.063 - - - - -
0.8260 19200 2.9688 - - - - -
0.8303 19300 3.0425 - - - - -
0.8346 19400 3.2018 - - - - -
0.8389 19500 2.9111 - - - - -
0.8432 19600 2.9516 - - - - -
0.8475 19700 2.9115 - - - - -
0.8518 19800 2.9323 - - - - -
0.8561 19900 2.8753 - - - - -
0.8604 20000 2.8344 - - - - -
0.8647 20100 2.7665 - - - - -
0.8690 20200 2.7732 - - - - -
0.8733 20300 2.8622 - - - - -
0.8776 20400 2.8749 - - - - -
0.8819 20500 2.8534 - - - - -
0.8863 20600 2.9254 - - - - -
0.8906 20700 2.7366 - - - - -
0.8949 20800 2.7287 - - - - -
0.8992 20900 2.9469 - - - - -
0.9035 21000 2.9052 - - - - -
0.9078 21100 2.7256 - - - - -
0.9121 21200 2.8469 - - - - -
0.9164 21300 2.6626 - - - - -
0.9207 21400 2.6796 - - - - -
0.9250 21500 2.6927 - - - - -
0.9293 21600 2.7125 - - - - -
0.9336 21700 2.6734 - - - - -
0.9379 21800 2.7199 - - - - -
0.9422 21900 2.6635 - - - - -
0.9465 22000 2.5218 - - - - -
0.9508 22100 2.7595 - - - - -
0.9551 22200 2.6821 - - - - -
0.9594 22300 2.6578 - - - - -
0.9637 22400 2.568 - - - - -
0.9680 22500 2.5527 - - - - -
0.9723 22600 2.6857 - - - - -
0.9766 22700 2.6637 - - - - -
0.9809 22800 2.6311 - - - - -
0.9852 22900 2.4635 - - - - -
0.9895 23000 2.6239 - - - - -
0.9938 23100 2.6873 - - - - -
0.9981 23200 2.5138 - - - - -
1.0 23244 - 0.6670 0.6815 0.6859 0.6524 0.6872

Framework Versions

  • Python: 3.11.9
  • Sentence Transformers: 3.0.1
  • Transformers: 4.40.1
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.29.3
  • Datasets: 2.19.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
11
Safetensors
Model size
137M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Evaluation results