SentenceTransformer based on UBC-NLP/serengeti-E250

This is a sentence-transformers model finetuned from UBC-NLP/serengeti-E250 on the Mollel/swahili-n_li-triplet-swh-eng dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: UBC-NLP/serengeti-E250
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity
Training Dataset:
- Mollel/swahili-n_li-triplet-swh-eng

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: ElectraModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sartifyllc/MultiLinguSwahili-MultiLinguSwahili-serengeti-E250-nli-matryoshka-nli-matryoshka")
# Run inference
sentences = [
    'Mwanamume na mwanamke wachanga waliovaa mikoba wanaweka au kuondoa kitu kutoka kwenye mti mweupe wa zamani, huku watu wengine wamesimama au wameketi nyuma.',
    'mwanamume na mwanamke wenye mikoba',
    'tai huruka',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-test-768
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.7084
spearman_cosine	0.7081
pearson_manhattan	0.7164
spearman_manhattan	0.7066
pearson_euclidean	0.7162
spearman_euclidean	0.7064
pearson_dot	0.3846
spearman_dot	0.3567
pearson_max	0.7164
spearman_max	0.7081

Semantic Similarity

Dataset: sts-test-512
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.706
spearman_cosine	0.7047
pearson_manhattan	0.7142
spearman_manhattan	0.7049
pearson_euclidean	0.715
spearman_euclidean	0.7055
pearson_dot	0.3855
spearman_dot	0.3586
pearson_max	0.715
spearman_max	0.7055

Semantic Similarity

Dataset: sts-test-256
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.7069
spearman_cosine	0.7072
pearson_manhattan	0.7152
spearman_manhattan	0.7051
pearson_euclidean	0.7155
spearman_euclidean	0.7049
pearson_dot	0.3729
spearman_dot	0.3481
pearson_max	0.7155
spearman_max	0.7072

Semantic Similarity

Dataset: sts-test-128
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.7023
spearman_cosine	0.7062
pearson_manhattan	0.7116
spearman_manhattan	0.7013
pearson_euclidean	0.7125
spearman_euclidean	0.7011
pearson_dot	0.3439
spearman_dot	0.3169
pearson_max	0.7125
spearman_max	0.7062

Semantic Similarity

Dataset: sts-test-64
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.695
spearman_cosine	0.6994
pearson_manhattan	0.706
spearman_manhattan	0.6939
pearson_euclidean	0.7066
spearman_euclidean	0.6949
pearson_dot	0.3098
spearman_dot	0.2855
pearson_max	0.7066
spearman_max	0.6994

Training Details

Training Dataset

Mollel/swahili-n_li-triplet-swh-eng

Dataset: Mollel/swahili-n_li-triplet-swh-eng
Size: 1,115,700 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 6 tokens mean: 11.27 tokens max: 48 tokens	min: 5 tokens mean: 13.0 tokens max: 29 tokens	min: 4 tokens mean: 12.56 tokens max: 29 tokens

Samples:

anchor	positive	negative
`A person on a horse jumps over a broken down airplane.`	`A person is outdoors, on a horse.`	`A person is at a diner, ordering an omelette.`
`Mtu aliyepanda farasi anaruka juu ya ndege iliyovunjika.`	`Mtu yuko nje, juu ya farasi.`	`Mtu yuko kwenye mkahawa, akiagiza omelette.`
`Children smiling and waving at camera`	`There are children present`	`The kids are frowning`

Loss: MatryoshkaLoss with these parameters:

{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        768,
        512,
        256,
        128,
        64
    ],
    "matryoshka_weights": [
        1,
        1,
        1,
        1,
        1
    ],
    "n_dims_per_step": -1
}

Evaluation Dataset

Mollel/swahili-n_li-triplet-swh-eng

Dataset: Mollel/swahili-n_li-triplet-swh-eng
Size: 13,168 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 5 tokens mean: 18.07 tokens max: 53 tokens	min: 4 tokens mean: 9.45 tokens max: 33 tokens	min: 4 tokens mean: 10.27 tokens max: 29 tokens

Samples:

anchor	positive	negative
`Two women are embracing while holding to go packages.`	`Two woman are holding packages.`	`The men are fighting outside a deli.`
`Wanawake wawili wanakumbatiana huku wakishikilia vifurushi vya kwenda.`	`Wanawake wawili wanashikilia vifurushi.`	`Wanaume hao wanapigana nje ya duka la vyakula vitamu.`
`Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.`	`Two kids in numbered jerseys wash their hands.`	`Two kids in jackets walk to school.`

Loss: MatryoshkaLoss with these parameters:

{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        768,
        512,
        256,
        128,
        64
    ],
    "matryoshka_weights": [
        1,
        1,
        1,
        1,
        1
    ],
    "n_dims_per_step": -1
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 32
per_device_eval_batch_size: 32
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
bf16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	sts-test-128_spearman_cosine	sts-test-256_spearman_cosine	sts-test-512_spearman_cosine	sts-test-64_spearman_cosine	sts-test-768_spearman_cosine
0.0057	100	26.7003	-	-	-	-	-
0.0115	200	20.7097	-	-	-	-	-
0.0172	300	17.2266	-	-	-	-	-
0.0229	400	15.7511	-	-	-	-	-
0.0287	500	14.5329	-	-	-	-	-
0.0344	600	12.6534	-	-	-	-	-
0.0402	700	10.6758	-	-	-	-	-
0.0459	800	9.421	-	-	-	-	-
0.0516	900	9.5664	-	-	-	-	-
0.0574	1000	8.5166	-	-	-	-	-
0.0631	1100	8.657	-	-	-	-	-
0.0688	1200	8.5473	-	-	-	-	-
0.0746	1300	8.3018	-	-	-	-	-
0.0803	1400	8.4488	-	-	-	-	-
0.0860	1500	7.1796	-	-	-	-	-
0.0918	1600	6.6136	-	-	-	-	-
0.0975	1700	6.2638	-	-	-	-	-
0.1033	1800	6.6955	-	-	-	-	-
0.1090	1900	7.3585	-	-	-	-	-
0.1147	2000	6.9043	-	-	-	-	-
0.1205	2100	6.677	-	-	-	-	-
0.1262	2200	6.3914	-	-	-	-	-
0.1319	2300	6.0045	-	-	-	-	-
0.1377	2400	5.8048	-	-	-	-	-
0.1434	2500	5.6898	-	-	-	-	-
0.1491	2600	5.229	-	-	-	-	-
0.1549	2700	5.2407	-	-	-	-	-
0.1606	2800	5.7074	-	-	-	-	-
0.1664	2900	6.2917	-	-	-	-	-
0.1721	3000	6.5651	-	-	-	-	-
0.1778	3100	6.7751	-	-	-	-	-
0.1836	3200	6.195	-	-	-	-	-
0.1893	3300	5.4697	-	-	-	-	-
0.1950	3400	5.1362	-	-	-	-	-
0.2008	3500	5.581	-	-	-	-	-
0.2065	3600	5.4309	-	-	-	-	-
0.2122	3700	5.6688	-	-	-	-	-
0.2180	3800	5.6923	-	-	-	-	-
0.2237	3900	5.8598	-	-	-	-	-
0.2294	4000	5.3498	-	-	-	-	-
0.2352	4100	5.3797	-	-	-	-	-
0.2409	4200	5.0389	-	-	-	-	-
0.2467	4300	5.6622	-	-	-	-	-
0.2524	4400	5.6249	-	-	-	-	-
0.2581	4500	5.6927	-	-	-	-	-
0.2639	4600	5.3612	-	-	-	-	-
0.2696	4700	5.2751	-	-	-	-	-
0.2753	4800	5.4224	-	-	-	-	-
0.2811	4900	5.0338	-	-	-	-	-
0.2868	5000	4.9813	-	-	-	-	-
0.2925	5100	4.8533	-	-	-	-	-
0.2983	5200	5.4137	-	-	-	-	-
0.3040	5300	5.4063	-	-	-	-	-
0.3098	5400	5.3107	-	-	-	-	-
0.3155	5500	5.0907	-	-	-	-	-
0.3212	5600	4.8644	-	-	-	-	-
0.3270	5700	4.7926	-	-	-	-	-
0.3327	5800	5.0268	-	-	-	-	-
0.3384	5900	5.3029	-	-	-	-	-
0.3442	6000	5.1246	-	-	-	-	-
0.3499	6100	5.1152	-	-	-	-	-
0.3556	6200	5.4265	-	-	-	-	-
0.3614	6300	4.7079	-	-	-	-	-
0.3671	6400	4.6368	-	-	-	-	-
0.3729	6500	4.662	-	-	-	-	-
0.3786	6600	5.3695	-	-	-	-	-
0.3843	6700	4.6974	-	-	-	-	-
0.3901	6800	4.6584	-	-	-	-	-
0.3958	6900	4.7413	-	-	-	-	-
0.4015	7000	4.6604	-	-	-	-	-
0.4073	7100	5.2476	-	-	-	-	-
0.4130	7200	4.9966	-	-	-	-	-
0.4187	7300	4.656	-	-	-	-	-
0.4245	7400	4.5711	-	-	-	-	-
0.4302	7500	5.0256	-	-	-	-	-
0.4360	7600	4.3856	-	-	-	-	-
0.4417	7700	4.2548	-	-	-	-	-
0.4474	7800	4.8584	-	-	-	-	-
0.4532	7900	4.8563	-	-	-	-	-
0.4589	8000	4.5101	-	-	-	-	-
0.4646	8100	4.4688	-	-	-	-	-
0.4704	8200	4.7076	-	-	-	-	-
0.4761	8300	4.3268	-	-	-	-	-
0.4818	8400	4.6622	-	-	-	-	-
0.4876	8500	4.4808	-	-	-	-	-
0.4933	8600	4.676	-	-	-	-	-
0.4991	8700	5.0348	-	-	-	-	-
0.5048	8800	4.5497	-	-	-	-	-
0.5105	8900	4.7428	-	-	-	-	-
0.5163	9000	4.4418	-	-	-	-	-
0.5220	9100	4.4946	-	-	-	-	-
0.5277	9200	4.5249	-	-	-	-	-
0.5335	9300	4.2413	-	-	-	-	-
0.5392	9400	4.4799	-	-	-	-	-
0.5449	9500	4.6807	-	-	-	-	-
0.5507	9600	4.5901	-	-	-	-	-
0.5564	9700	4.7266	-	-	-	-	-
0.5622	9800	4.692	-	-	-	-	-
0.5679	9900	4.8651	-	-	-	-	-
0.5736	10000	4.7746	-	-	-	-	-
0.5794	10100	4.68	-	-	-	-	-
0.5851	10200	4.7697	-	-	-	-	-
0.5908	10300	4.8848	-	-	-	-	-
0.5966	10400	4.4004	-	-	-	-	-
0.6023	10500	4.2979	-	-	-	-	-
0.6080	10600	4.7266	-	-	-	-	-
0.6138	10700	4.8605	-	-	-	-	-
0.6195	10800	4.7436	-	-	-	-	-
0.6253	10900	4.6239	-	-	-	-	-
0.6310	11000	4.394	-	-	-	-	-
0.6367	11100	4.8081	-	-	-	-	-
0.6425	11200	4.2329	-	-	-	-	-
0.6482	11300	4.873	-	-	-	-	-
0.6539	11400	4.5557	-	-	-	-	-
0.6597	11500	4.7918	-	-	-	-	-
0.6654	11600	4.1607	-	-	-	-	-
0.6711	11700	4.8744	-	-	-	-	-
0.6769	11800	5.0072	-	-	-	-	-
0.6826	11900	4.3532	-	-	-	-	-
0.6883	12000	4.3319	-	-	-	-	-
0.6941	12100	4.6885	-	-	-	-	-
0.6998	12200	4.6682	-	-	-	-	-
0.7056	12300	4.4258	-	-	-	-	-
0.7113	12400	4.6136	-	-	-	-	-
0.7170	12500	4.3594	-	-	-	-	-
0.7228	12600	4.0627	-	-	-	-	-
0.7285	12700	4.5244	-	-	-	-	-
0.7342	12800	4.504	-	-	-	-	-
0.7400	12900	4.4694	-	-	-	-	-
0.7457	13000	4.4804	-	-	-	-	-
0.7514	13100	4.0588	-	-	-	-	-
0.7572	13200	4.8016	-	-	-	-	-
0.7629	13300	4.2971	-	-	-	-	-
0.7687	13400	4.1326	-	-	-	-	-
0.7744	13500	3.9763	-	-	-	-	-
0.7801	13600	3.7716	-	-	-	-	-
0.7859	13700	3.8448	-	-	-	-	-
0.7916	13800	3.6779	-	-	-	-	-
0.7973	13900	3.5938	-	-	-	-	-
0.8031	14000	3.3981	-	-	-	-	-
0.8088	14100	3.4151	-	-	-	-	-
0.8145	14200	3.2498	-	-	-	-	-
0.8203	14300	3.4909	-	-	-	-	-
0.8260	14400	3.4098	-	-	-	-	-
0.8318	14500	3.4448	-	-	-	-	-
0.8375	14600	3.2868	-	-	-	-	-
0.8432	14700	3.2196	-	-	-	-	-
0.8490	14800	3.0852	-	-	-	-	-
0.8547	14900	3.2341	-	-	-	-	-
0.8604	15000	3.164	-	-	-	-	-
0.8662	15100	3.0919	-	-	-	-	-
0.8719	15200	3.176	-	-	-	-	-
0.8776	15300	3.1361	-	-	-	-	-
0.8834	15400	3.0683	-	-	-	-	-
0.8891	15500	3.0275	-	-	-	-	-
0.8949	15600	3.0763	-	-	-	-	-
0.9006	15700	3.1828	-	-	-	-	-
0.9063	15800	3.0053	-	-	-	-	-
0.9121	15900	2.9696	-	-	-	-	-
0.9178	16000	2.8919	-	-	-	-	-
0.9235	16100	2.9922	-	-	-	-	-
0.9293	16200	2.9063	-	-	-	-	-
0.9350	16300	3.0633	-	-	-	-	-
0.9407	16400	3.1782	-	-	-	-	-
0.9465	16500	2.9206	-	-	-	-	-
0.9522	16600	2.8785	-	-	-	-	-
0.9580	16700	2.9934	-	-	-	-	-
0.9637	16800	3.0125	-	-	-	-	-
0.9694	16900	2.9338	-	-	-	-	-
0.9752	17000	2.9931	-	-	-	-	-
0.9809	17100	2.956	-	-	-	-	-
0.9866	17200	2.8415	-	-	-	-	-
0.9924	17300	3.0072	-	-	-	-	-
0.9981	17400	2.9046	-	-	-	-	-
1.0	17433	-	0.7062	0.7072	0.7047	0.6994	0.7081

Framework Versions

Python: 3.11.9
Sentence Transformers: 3.0.1
Transformers: 4.40.1
PyTorch: 2.3.0+cu121
Accelerate: 0.29.3
Datasets: 2.19.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}