SentenceTransformer based on xmanii/maux-gte-persian-v3

This is a sentence-transformers model finetuned from xmanii/maux-gte-persian-v3. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: xmanii/maux-gte-persian-v3
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'NewModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("axiomepic/gte-persian-seo-keyword-embedding")
# Run inference
sentences = [
    'چراغ قوه\u200c ضد انفجار',
    'دودی روشن',
    'ففدول',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7218, 0.5096],
#         [0.7218, 1.0000, 0.5915],
#         [0.5096, 0.5915, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Size: 8,721 training samples
Columns: sentence1, sentence2, and label
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label
type string string int
details
min: 3 tokens
mean: 6.51 tokens
max: 14 tokens

min: 3 tokens
mean: 6.19 tokens
max: 18 tokens

0: ~35.90%
1: ~64.10%
Samples:

sentence1 sentence2 label

قیمت لباس بچه بهترین سایت لباس کودک 1

قیمت رژلب قیمت ژل ابرو 1

سیگار برگ‌ سیگاری 1
Loss: OnlineContrastiveLoss

	sentence1	sentence2	label
type	string	string	int
details	min: 3 tokens mean: 6.51 tokens max: 14 tokens	min: 3 tokens mean: 6.19 tokens max: 18 tokens	0: ~35.90% 1: ~64.10%

sentence1	sentence2	label
`قیمت لباس بچه`	`بهترین سایت لباس کودک`	`1`
`قیمت رژلب`	`قیمت ژل ابرو`	`1`
`سیگار برگ‌`	`سیگاری`	`1`

Evaluation Dataset

Unnamed Dataset

Size: 969 evaluation samples
Columns: sentence1, sentence2, and label
Approximate statistics based on the first 969 samples:
sentence1 sentence2 label
type string string int
details
min: 3 tokens
mean: 6.49 tokens
max: 15 tokens

min: 3 tokens
mean: 6.18 tokens
max: 19 tokens

0: ~33.54%
1: ~66.46%

	sentence1	sentence2	label
type	string	string	int
details	min: 3 tokens mean: 6.49 tokens max: 15 tokens	min: 3 tokens mean: 6.18 tokens max: 19 tokens	0: ~33.54% 1: ~66.46%

Samples:

sentence1	sentence2	label
`خواص دنبه`	`خواص روغن دنبه`	`1`
`چای ماچا`	`چای نعناع`	`1`
`اشتراک اسپاتیفای`	`خرید اکانت پرمیوم اسپاتیفای`	`1`

Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: epoch
per_device_train_batch_size: 16
per_device_eval_batch_size: 64
gradient_accumulation_steps: 2
learning_rate: 1e-05
num_train_epochs: 10
warmup_ratio: 0.05
log_level_replica: passive
log_on_each_node: False
logging_nan_inf_filter: False
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: epoch
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 2
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 1e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 10
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.05
warmup_steps: 0
log_level: passive
log_level_replica: passive
log_on_each_node: False
logging_nan_inf_filter: False
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss	Validation Loss
0.0037	1	0.5054	-
0.0073	2	0.5856	-
0.0110	3	0.5052	-
0.0147	4	1.1225	-
0.0183	5	0.9045	-
0.0220	6	0.9862	-
0.0256	7	1.1367	-
0.0293	8	1.032	-
0.0330	9	1.1067	-
0.0366	10	0.5211	-
0.0403	11	1.3348	-
0.0440	12	0.7714	-
0.0476	13	0.5516	-
0.0513	14	0.8319	-
0.0549	15	0.8509	-
0.0586	16	1.1887	-
0.0623	17	1.4326	-
0.0659	18	0.8932	-
0.0696	19	0.7684	-
0.0733	20	0.6201	-
0.0769	21	1.6244	-
0.0806	22	0.786	-
0.0842	23	0.77	-
0.0879	24	0.8046	-
0.0916	25	0.7242	-
0.0952	26	0.8478	-
0.0989	27	1.0491	-
0.1026	28	0.4904	-
0.1062	29	1.277	-
0.1099	30	1.0306	-
0.1136	31	0.8367	-
0.1172	32	0.9737	-
0.1209	33	0.6155	-
0.1245	34	0.2685	-
0.1282	35	0.8825	-
0.1319	36	1.0767	-
0.1355	37	1.1151	-
0.1392	38	0.9151	-
0.1429	39	1.1762	-
0.1465	40	0.6362	-
0.1502	41	0.8624	-
0.1538	42	0.7831	-
0.1575	43	0.5686	-
0.1612	44	0.8203	-
0.1648	45	0.6298	-
0.1685	46	0.7037	-
0.1722	47	0.7446	-
0.1758	48	0.8063	-
0.1795	49	0.918	-
0.1832	50	1.2139	-
0.1868	51	0.7787	-
0.1905	52	0.4508	-
0.1941	53	0.8636	-
0.1978	54	0.8607	-
0.2015	55	1.1511	-
0.2051	56	0.7653	-
0.2088	57	0.441	-
0.2125	58	0.6974	-
0.2161	59	0.7481	-
0.2198	60	0.727	-
0.2234	61	0.8182	-
0.2271	62	0.4998	-
0.2308	63	0.949	-
0.2344	64	0.5796	-
0.2381	65	0.7822	-
0.2418	66	1.1591	-
0.2454	67	0.7478	-
0.2491	68	0.8698	-
0.2527	69	0.5906	-
0.2564	70	0.9387	-
0.2601	71	0.5571	-
0.2637	72	0.3962	-
0.2674	73	0.7237	-
0.2711	74	0.6404	-
0.2747	75	0.8193	-
0.2784	76	0.5669	-
0.2821	77	0.75	-
0.2857	78	0.6934	-
0.2894	79	0.6464	-
0.2930	80	0.6564	-
0.2967	81	0.6188	-
0.3004	82	0.6652	-
0.3040	83	0.4678	-
0.3077	84	0.7576	-
0.3114	85	0.5472	-
0.3150	86	0.7941	-
0.3187	87	0.6895	-
0.3223	88	0.6192	-
0.3260	89	0.6249	-
0.3297	90	0.6652	-
0.3333	91	0.8822	-
0.3370	92	1.0119	-
0.3407	93	0.8161	-
0.3443	94	0.6366	-
0.3480	95	0.7421	-
0.3516	96	0.8887	-
0.3553	97	0.8511	-
0.3590	98	0.5364	-
0.3626	99	0.7103	-
0.3663	100	0.4809	-
0.3700	101	0.4972	-
0.3736	102	0.711	-
0.3773	103	0.4176	-
0.3810	104	0.6275	-
0.3846	105	0.4639	-
0.3883	106	0.6699	-
0.3919	107	0.8027	-
0.3956	108	0.7053	-
0.3993	109	0.097	-
0.4029	110	0.7775	-
0.4066	111	0.4949	-
0.4103	112	0.7027	-
0.4139	113	0.3667	-
0.4176	114	0.6542	-
0.4212	115	0.5256	-
0.4249	116	0.5562	-
0.4286	117	0.365	-
0.4322	118	0.5834	-
0.4359	119	0.6584	-
0.4396	120	0.6638	-
0.4432	121	0.489	-
0.4469	122	0.5541	-
0.4505	123	0.1923	-
0.4542	124	0.4627	-
0.4579	125	0.4169	-
0.4615	126	0.3824	-
0.4652	127	0.5774	-
0.4689	128	0.3938	-
0.4725	129	0.5052	-
0.4762	130	0.6401	-
0.4799	131	0.5691	-
0.4835	132	0.5058	-
0.4872	133	0.5309	-
0.4908	134	0.4821	-
0.4945	135	0.5954	-
0.4982	136	0.3729	-
0.5018	137	0.6607	-
0.5055	138	0.5283	-
0.5092	139	0.6103	-
0.5128	140	0.456	-
0.5165	141	0.7122	-
0.5201	142	0.6458	-
0.5238	143	0.4434	-
0.5275	144	0.6982	-
0.5311	145	0.7074	-
0.5348	146	0.6441	-
0.5385	147	0.1969	-
0.5421	148	0.2974	-
0.5458	149	0.3946	-
0.5495	150	0.4603	-
0.5531	151	0.6021	-
0.5568	152	0.3643	-
0.5604	153	0.2497	-
0.5641	154	0.4532	-
0.5678	155	0.5185	-
0.5714	156	0.457	-
0.5751	157	0.4512	-
0.5788	158	0.48	-
0.5824	159	0.2682	-
0.5861	160	0.594	-
0.5897	161	0.6727	-
0.5934	162	0.7087	-
0.5971	163	0.4186	-
0.6007	164	0.4273	-
0.6044	165	0.5857	-
0.6081	166	0.2617	-
0.6117	167	0.4383	-
0.6154	168	0.4867	-
0.6190	169	0.4619	-
0.6227	170	0.1319	-
0.6264	171	0.2212	-
0.6300	172	0.5229	-
0.6337	173	0.6967	-
0.6374	174	0.338	-
0.6410	175	0.1651	-
0.6447	176	0.2449	-
0.6484	177	0.3473	-
0.6520	178	0.2902	-
0.6557	179	0.4093	-
0.6593	180	0.4406	-
0.6630	181	0.443	-
0.6667	182	0.4409	-
0.6703	183	0.7087	-
0.6740	184	0.4577	-
0.6777	185	0.3511	-
0.6813	186	0.3783	-
0.6850	187	0.5639	-
0.6886	188	0.4599	-
0.6923	189	0.4282	-
0.6960	190	0.242	-
0.6996	191	0.587	-
0.7033	192	0.67	-
0.7070	193	0.2562	-
0.7106	194	0.5278	-
0.7143	195	0.2321	-
0.7179	196	0.745	-
0.7216	197	0.6735	-
0.7253	198	0.4361	-
0.7289	199	0.3047	-
0.7326	200	0.3714	-
0.7363	201	0.8609	-
0.7399	202	0.4459	-
0.7436	203	0.1546	-
0.7473	204	0.4546	-
0.7509	205	0.4743	-
0.7546	206	0.3223	-
0.7582	207	0.4644	-
0.7619	208	0.6073	-
0.7656	209	0.5021	-
0.7692	210	0.5722	-
0.7729	211	0.237	-
0.7766	212	0.3782	-
0.7802	213	0.4302	-
0.7839	214	0.5929	-
0.7875	215	0.0646	-
0.7912	216	0.3934	-
0.7949	217	0.3317	-
0.7985	218	0.5997	-
0.8022	219	0.511	-
0.8059	220	0.384	-
0.8095	221	0.3319	-
0.8132	222	0.4738	-
0.8168	223	0.2536	-
0.8205	224	0.3429	-
0.8242	225	0.5208	-
0.8278	226	0.3044	-
0.8315	227	0.5025	-
0.8352	228	0.2541	-
0.8388	229	0.4347	-
0.8425	230	0.5067	-
0.8462	231	0.3975	-
0.8498	232	0.3168	-
0.8535	233	0.4299	-
0.8571	234	0.3067	-
0.8608	235	0.1385	-
0.8645	236	0.45	-
0.8681	237	0.7386	-
0.8718	238	0.4154	-
0.8755	239	0.287	-
0.8791	240	0.3703	-
0.8828	241	0.5419	-
0.8864	242	0.3498	-
0.8901	243	0.3481	-
0.8938	244	0.7203	-
0.8974	245	0.4363	-
0.9011	246	0.2272	-
0.9048	247	0.6132	-
0.9084	248	0.5764	-
0.9121	249	0.4819	-
0.9158	250	0.3273	-
0.9194	251	0.4039	-
0.9231	252	0.5303	-
0.9267	253	0.6131	-
0.9304	254	0.448	-
0.9341	255	0.0888	-
0.9377	256	0.4092	-
0.9414	257	0.196	-
0.9451	258	0.6282	-
0.9487	259	0.6653	-
0.9524	260	0.4198	-
0.9560	261	0.4985	-
0.9597	262	0.0	-
0.9634	263	0.2706	-
0.9670	264	0.5704	-
0.9707	265	0.4269	-
0.9744	266	0.2325	-
0.9780	267	0.4256	-
0.9817	268	0.4286	-
0.9853	269	0.3987	-
0.9890	270	0.4431	-
0.9927	271	0.578	-
0.9963	272	0.2845	-
1.0	273	0.1293	1.9891

Framework Versions

Python: 3.11.13
Sentence Transformers: 5.1.2
Transformers: 4.53.3
PyTorch: 2.6.0+cu124
Accelerate: 1.9.0
Datasets: 4.4.1
Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}