SentenceTransformer based on sentence-transformers/all-distilroberta-v1
This is a sentence-transformers model finetuned from sentence-transformers/all-distilroberta-v1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-distilroberta-v1
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("hanwenzhu/all-distilroberta-v1-lr2e-4-bs1024-nneg3-ml-feb22")
# Run inference
sentences = [
'Mathlib.Analysis.Convex.StoneSeparation#0',
'Nat.le_of_lt',
'AddCommMonoid.nat_isScalarTower',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 5,702,228 training samples
- Columns:
state_name
andpremise_name
- Approximate statistics based on the first 1000 samples:
state_name premise_name type string string details - min: 12 tokens
- mean: 17.86 tokens
- max: 23 tokens
- min: 3 tokens
- mean: 10.93 tokens
- max: 36 tokens
- Samples:
state_name premise_name Mathlib.Topology.EMetricSpace.BoundedVariation#253
Set.union_empty
Mathlib.Topology.EMetricSpace.BoundedVariation#253
le_refl
Mathlib.Topology.EMetricSpace.BoundedVariation#253
le_of_le_of_eq
- Loss:
loss.MaskedCachedMultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
Unnamed Dataset
- Size: 2,334 evaluation samples
- Columns:
state_name
andpremise_name
- Approximate statistics based on the first 1000 samples:
state_name premise_name type string string details - min: 11 tokens
- mean: 16.59 tokens
- max: 24 tokens
- min: 3 tokens
- mean: 11.75 tokens
- max: 32 tokens
- Samples:
state_name premise_name Mathlib.Algebra.Algebra.Operations#96
Submodule.le_pow_toAddSubmonoid
Mathlib.Algebra.Algebra.Operations#96
AddSubmonoid.pow_subset_pow
Mathlib.Algebra.Algebra.Operations#96
trans
- Loss:
loss.MaskedCachedMultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 1024per_device_eval_batch_size
: 64learning_rate
: 0.0002num_train_epochs
: 1.0lr_scheduler_type
: cosinewarmup_ratio
: 0.03bf16
: Truedataloader_num_workers
: 4batch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 1024per_device_eval_batch_size
: 64per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 0.0002weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1.0max_steps
: -1lr_scheduler_type
: cosinelr_scheduler_kwargs
: {}warmup_ratio
: 0.03warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 4dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Click to expand
Epoch | Step | Training Loss | loss |
---|---|---|---|
0.0018 | 10 | 6.4945 | - |
0.0036 | 20 | 5.9953 | - |
0.0054 | 30 | 5.6726 | - |
0.0072 | 40 | 5.489 | - |
0.0090 | 50 | 5.3066 | - |
0.0101 | 56 | - | 1.4491 |
0.0108 | 60 | 5.1176 | - |
0.0126 | 70 | 5.0673 | - |
0.0144 | 80 | 5.0028 | - |
0.0162 | 90 | 4.957 | - |
0.0180 | 100 | 4.8704 | - |
0.0198 | 110 | 4.8362 | - |
0.0201 | 112 | - | 1.2623 |
0.0215 | 120 | 4.7757 | - |
0.0233 | 130 | 4.6646 | - |
0.0251 | 140 | 4.6617 | - |
0.0269 | 150 | 4.6957 | - |
0.0287 | 160 | 4.5359 | - |
0.0302 | 168 | - | 1.2172 |
0.0305 | 170 | 4.5352 | - |
0.0323 | 180 | 4.4969 | - |
0.0341 | 190 | 4.484 | - |
0.0359 | 200 | 4.4936 | - |
0.0377 | 210 | 4.3855 | - |
0.0395 | 220 | 4.3338 | - |
0.0402 | 224 | - | 1.2096 |
0.0413 | 230 | 4.3023 | - |
0.0431 | 240 | 4.3158 | - |
0.0449 | 250 | 4.291 | - |
0.0467 | 260 | 4.2303 | - |
0.0485 | 270 | 4.2196 | - |
0.0503 | 280 | 4.237 | 1.1234 |
0.0521 | 290 | 4.2183 | - |
0.0539 | 300 | 4.1804 | - |
0.0557 | 310 | 4.1496 | - |
0.0575 | 320 | 4.1086 | - |
0.0593 | 330 | 4.0588 | - |
0.0603 | 336 | - | 0.9823 |
0.0611 | 340 | 4.0566 | - |
0.0628 | 350 | 4.0886 | - |
0.0646 | 360 | 4.126 | - |
0.0664 | 370 | 3.9956 | - |
0.0682 | 380 | 4.0245 | - |
0.0700 | 390 | 4.0398 | - |
0.0704 | 392 | - | 0.9728 |
0.0718 | 400 | 3.9756 | - |
0.0736 | 410 | 4.0221 | - |
0.0754 | 420 | 3.977 | - |
0.0772 | 430 | 3.8922 | - |
0.0790 | 440 | 3.9496 | - |
0.0804 | 448 | - | 0.9045 |
0.0808 | 450 | 3.8841 | - |
0.0826 | 460 | 3.8596 | - |
0.0844 | 470 | 3.8682 | - |
0.0862 | 480 | 3.8671 | - |
0.0880 | 490 | 3.829 | - |
0.0898 | 500 | 3.7833 | - |
0.0905 | 504 | - | 0.8283 |
0.0916 | 510 | 3.7498 | - |
0.0934 | 520 | 3.8393 | - |
0.0952 | 530 | 3.7889 | - |
0.0970 | 540 | 3.798 | - |
0.0988 | 550 | 3.7653 | - |
0.1006 | 560 | 3.7703 | 0.8599 |
0.1024 | 570 | 3.7121 | - |
0.1041 | 580 | 3.7443 | - |
0.1059 | 590 | 3.7417 | - |
0.1077 | 600 | 3.6768 | - |
0.1095 | 610 | 3.6305 | - |
0.1106 | 616 | - | 0.8878 |
0.1113 | 620 | 3.6516 | - |
0.1131 | 630 | 3.6454 | - |
0.1149 | 640 | 3.6808 | - |
0.1167 | 650 | 3.662 | - |
0.1185 | 660 | 3.6466 | - |
0.1203 | 670 | 3.5734 | - |
0.1207 | 672 | - | 0.8875 |
0.1221 | 680 | 3.5605 | - |
0.1239 | 690 | 3.6263 | - |
0.1257 | 700 | 3.6224 | - |
0.1275 | 710 | 3.5387 | - |
0.1293 | 720 | 3.5355 | - |
0.1307 | 728 | - | 0.8349 |
0.1311 | 730 | 3.5723 | - |
0.1329 | 740 | 3.5018 | - |
0.1347 | 750 | 3.4456 | - |
0.1365 | 760 | 3.4415 | - |
0.1383 | 770 | 3.4535 | - |
0.1401 | 780 | 3.4423 | - |
0.1408 | 784 | - | 0.8113 |
0.1419 | 790 | 3.5116 | - |
0.1437 | 800 | 3.4681 | - |
0.1454 | 810 | 3.4181 | - |
0.1472 | 820 | 3.4289 | - |
0.1490 | 830 | 3.4553 | - |
0.1508 | 840 | 3.4506 | 0.8186 |
0.1526 | 850 | 3.4006 | - |
0.1544 | 860 | 3.4412 | - |
0.1562 | 870 | 3.3971 | - |
0.1580 | 880 | 3.3829 | - |
0.1598 | 890 | 3.4066 | - |
0.1609 | 896 | - | 0.7989 |
0.1616 | 900 | 3.4174 | - |
0.1634 | 910 | 3.3869 | - |
0.1652 | 920 | 3.3616 | - |
0.1670 | 930 | 3.3639 | - |
0.1688 | 940 | 3.3353 | - |
0.1706 | 950 | 3.3401 | - |
0.1709 | 952 | - | 0.7703 |
0.1724 | 960 | 3.3322 | - |
0.1742 | 970 | 3.3129 | - |
0.1760 | 980 | 3.3336 | - |
0.1778 | 990 | 3.2899 | - |
0.1796 | 1000 | 3.3012 | - |
0.1810 | 1008 | - | 0.7533 |
0.1814 | 1010 | 3.2885 | - |
0.1832 | 1020 | 3.2861 | - |
0.1850 | 1030 | 3.2935 | - |
0.1867 | 1040 | 3.3401 | - |
0.1885 | 1050 | 3.3192 | - |
0.1903 | 1060 | 3.306 | - |
0.1911 | 1064 | - | 0.7385 |
0.1921 | 1070 | 3.2599 | - |
0.1939 | 1080 | 3.1642 | - |
0.1957 | 1090 | 3.2544 | - |
0.1975 | 1100 | 3.1976 | - |
0.1993 | 1110 | 3.1664 | - |
0.2011 | 1120 | 3.1119 | 0.7099 |
0.2029 | 1130 | 3.1349 | - |
0.2047 | 1140 | 3.2138 | - |
0.2065 | 1150 | 3.2007 | - |
0.2083 | 1160 | 3.1433 | - |
0.2101 | 1170 | 3.1061 | - |
0.2112 | 1176 | - | 0.7260 |
0.2119 | 1180 | 3.1275 | - |
0.2137 | 1190 | 3.1019 | - |
0.2155 | 1200 | 3.1205 | - |
0.2173 | 1210 | 3.0568 | - |
0.2191 | 1220 | 3.1019 | - |
0.2209 | 1230 | 3.1172 | - |
0.2212 | 1232 | - | 0.7232 |
0.2227 | 1240 | 3.0902 | - |
0.2245 | 1250 | 3.0309 | - |
0.2263 | 1260 | 3.0369 | - |
0.2280 | 1270 | 3.0152 | - |
0.2298 | 1280 | 3.0631 | - |
0.2313 | 1288 | - | 0.6834 |
0.2316 | 1290 | 3.0995 | - |
0.2334 | 1300 | 3.0935 | - |
0.2352 | 1310 | 3.0539 | - |
0.2370 | 1320 | 3.0385 | - |
0.2388 | 1330 | 3.0614 | - |
0.2406 | 1340 | 3.0869 | - |
0.2413 | 1344 | - | 0.7055 |
0.2424 | 1350 | 3.0854 | - |
0.2442 | 1360 | 3.0363 | - |
0.2460 | 1370 | 3.0643 | - |
0.2478 | 1380 | 3.0698 | - |
0.2496 | 1390 | 3.0005 | - |
0.2514 | 1400 | 2.9856 | 0.6682 |
0.2532 | 1410 | 3.0242 | - |
0.2550 | 1420 | 3.0012 | - |
0.2568 | 1430 | 3.0131 | - |
0.2586 | 1440 | 3.0069 | - |
0.2604 | 1450 | 2.9781 | - |
0.2614 | 1456 | - | 0.6871 |
0.2622 | 1460 | 2.9552 | - |
0.2640 | 1470 | 2.9734 | - |
0.2658 | 1480 | 2.9974 | - |
0.2676 | 1490 | 2.9739 | - |
0.2693 | 1500 | 2.9154 | - |
0.2711 | 1510 | 2.9461 | - |
0.2715 | 1512 | - | 0.6957 |
0.2729 | 1520 | 2.8891 | - |
0.2747 | 1530 | 2.9345 | - |
0.2765 | 1540 | 2.9421 | - |
0.2783 | 1550 | 2.9024 | - |
0.2801 | 1560 | 2.9436 | - |
0.2816 | 1568 | - | 0.6855 |
0.2819 | 1570 | 2.9584 | - |
0.2837 | 1580 | 2.9022 | - |
0.2855 | 1590 | 2.8767 | - |
0.2873 | 1600 | 2.9197 | - |
0.2891 | 1610 | 2.8995 | - |
0.2909 | 1620 | 2.8613 | - |
0.2916 | 1624 | - | 0.6869 |
0.2927 | 1630 | 2.8522 | - |
0.2945 | 1640 | 2.8988 | - |
0.2963 | 1650 | 2.8307 | - |
0.2981 | 1660 | 2.8281 | - |
0.2999 | 1670 | 2.835 | - |
0.3017 | 1680 | 2.8305 | 0.6352 |
0.3035 | 1690 | 2.8139 | - |
0.3053 | 1700 | 2.8655 | - |
0.3071 | 1710 | 2.8651 | - |
0.3089 | 1720 | 2.8026 | - |
0.3106 | 1730 | 2.7712 | - |
0.3117 | 1736 | - | 0.6213 |
0.3124 | 1740 | 2.8073 | - |
0.3142 | 1750 | 2.7572 | - |
0.3160 | 1760 | 2.7446 | - |
0.3178 | 1770 | 2.7955 | - |
0.3196 | 1780 | 2.7745 | - |
0.3214 | 1790 | 2.7254 | - |
0.3218 | 1792 | - | 0.6358 |
0.3232 | 1800 | 2.7719 | - |
0.3250 | 1810 | 2.7386 | - |
0.3268 | 1820 | 2.705 | - |
0.3286 | 1830 | 2.7102 | - |
0.3304 | 1840 | 2.7694 | - |
0.3318 | 1848 | - | 0.6394 |
0.3322 | 1850 | 2.7433 | - |
0.3340 | 1860 | 2.6986 | - |
0.3358 | 1870 | 2.7005 | - |
0.3376 | 1880 | 2.6814 | - |
0.3394 | 1890 | 2.6811 | - |
0.3412 | 1900 | 2.7303 | - |
0.3419 | 1904 | - | 0.6303 |
0.3430 | 1910 | 2.7674 | - |
0.3448 | 1920 | 2.7573 | - |
0.3466 | 1930 | 2.7488 | - |
0.3484 | 1940 | 2.7408 | - |
0.3502 | 1950 | 2.6989 | - |
0.3519 | 1960 | 2.7066 | 0.6180 |
0.3537 | 1970 | 2.707 | - |
0.3555 | 1980 | 2.6932 | - |
0.3573 | 1990 | 2.7165 | - |
0.3591 | 2000 | 2.6938 | - |
0.3609 | 2010 | 2.7207 | - |
0.3620 | 2016 | - | 0.5906 |
0.3627 | 2020 | 2.7456 | - |
0.3645 | 2030 | 2.714 | - |
0.3663 | 2040 | 2.6607 | - |
0.3681 | 2050 | 2.6659 | - |
0.3699 | 2060 | 2.6621 | - |
0.3717 | 2070 | 2.6872 | - |
0.3721 | 2072 | - | 0.5879 |
0.3735 | 2080 | 2.6439 | - |
0.3753 | 2090 | 2.6849 | - |
0.3771 | 2100 | 2.6518 | - |
0.3789 | 2110 | 2.5955 | - |
0.3807 | 2120 | 2.6138 | - |
0.3821 | 2128 | - | 0.5945 |
0.3825 | 2130 | 2.5803 | - |
0.3843 | 2140 | 2.6437 | - |
0.3861 | 2150 | 2.6264 | - |
0.3879 | 2160 | 2.5644 | - |
0.3897 | 2170 | 2.5971 | - |
0.3915 | 2180 | 2.52 | - |
0.3922 | 2184 | - | 0.5953 |
0.3932 | 2190 | 2.5523 | - |
0.3950 | 2200 | 2.599 | - |
0.3968 | 2210 | 2.5832 | - |
0.3986 | 2220 | 2.6254 | - |
0.4004 | 2230 | 2.5838 | - |
0.4022 | 2240 | 2.5737 | 0.5751 |
0.4040 | 2250 | 2.5663 | - |
0.4058 | 2260 | 2.6058 | - |
0.4076 | 2270 | 2.5968 | - |
0.4094 | 2280 | 2.5784 | - |
0.4112 | 2290 | 2.5363 | - |
0.4123 | 2296 | - | 0.5810 |
0.4130 | 2300 | 2.5149 | - |
0.4148 | 2310 | 2.558 | - |
0.4166 | 2320 | 2.5614 | - |
0.4184 | 2330 | 2.5482 | - |
0.4202 | 2340 | 2.5458 | - |
0.4220 | 2350 | 2.5281 | - |
0.4223 | 2352 | - | 0.5673 |
0.4238 | 2360 | 2.5617 | - |
0.4256 | 2370 | 2.5337 | - |
0.4274 | 2380 | 2.5321 | - |
0.4292 | 2390 | 2.5506 | - |
0.4310 | 2400 | 2.5214 | - |
0.4324 | 2408 | - | 0.5650 |
0.4328 | 2410 | 2.5245 | - |
0.4345 | 2420 | 2.5047 | - |
0.4363 | 2430 | 2.5719 | - |
0.4381 | 2440 | 2.512 | - |
0.4399 | 2450 | 2.5076 | - |
0.4417 | 2460 | 2.4517 | - |
0.4424 | 2464 | - | 0.5772 |
0.4435 | 2470 | 2.4911 | - |
0.4453 | 2480 | 2.5638 | - |
0.4471 | 2490 | 2.5349 | - |
0.4489 | 2500 | 2.4961 | - |
0.4507 | 2510 | 2.5169 | - |
0.4525 | 2520 | 2.489 | 0.5655 |
0.4543 | 2530 | 2.475 | - |
0.4561 | 2540 | 2.4378 | - |
0.4579 | 2550 | 2.4252 | - |
0.4597 | 2560 | 2.4448 | - |
0.4615 | 2570 | 2.4596 | - |
0.4626 | 2576 | - | 0.5544 |
0.4633 | 2580 | 2.4811 | - |
0.4651 | 2590 | 2.4459 | - |
0.4669 | 2600 | 2.4261 | - |
0.4687 | 2610 | 2.4214 | - |
0.4705 | 2620 | 2.4528 | - |
0.4723 | 2630 | 2.4374 | - |
0.4726 | 2632 | - | 0.5336 |
0.4741 | 2640 | 2.4585 | - |
0.4758 | 2650 | 2.4529 | - |
0.4776 | 2660 | 2.4205 | - |
0.4794 | 2670 | 2.441 | - |
0.4812 | 2680 | 2.4654 | - |
0.4827 | 2688 | - | 0.5314 |
0.4830 | 2690 | 2.4535 | - |
0.4848 | 2700 | 2.5085 | - |
0.4866 | 2710 | 2.4725 | - |
0.4884 | 2720 | 2.4655 | - |
0.4902 | 2730 | 2.4137 | - |
0.4920 | 2740 | 2.4172 | - |
0.4927 | 2744 | - | 0.5352 |
0.4938 | 2750 | 2.434 | - |
0.4956 | 2760 | 2.4489 | - |
0.4974 | 2770 | 2.4448 | - |
0.4992 | 2780 | 2.3979 | - |
0.5010 | 2790 | 2.4251 | - |
0.5028 | 2800 | 2.3996 | 0.5313 |
0.5046 | 2810 | 2.4467 | - |
0.5064 | 2820 | 2.4338 | - |
0.5082 | 2830 | 2.4386 | - |
0.5100 | 2840 | 2.3813 | - |
0.5118 | 2850 | 2.4149 | - |
0.5128 | 2856 | - | 0.5261 |
0.5136 | 2860 | 2.3822 | - |
0.5154 | 2870 | 2.407 | - |
0.5171 | 2880 | 2.3406 | - |
0.5189 | 2890 | 2.3845 | - |
0.5207 | 2900 | 2.3176 | - |
0.5225 | 2910 | 2.3554 | - |
0.5229 | 2912 | - | 0.5172 |
0.5243 | 2920 | 2.3905 | - |
0.5261 | 2930 | 2.3994 | - |
0.5279 | 2940 | 2.4004 | - |
0.5297 | 2950 | 2.3499 | - |
0.5315 | 2960 | 2.3758 | - |
0.5330 | 2968 | - | 0.5340 |
0.5333 | 2970 | 2.3644 | - |
0.5351 | 2980 | 2.3288 | - |
0.5369 | 2990 | 2.3504 | - |
0.5387 | 3000 | 2.2991 | - |
0.5405 | 3010 | 2.3471 | - |
0.5423 | 3020 | 2.3408 | - |
0.5430 | 3024 | - | 0.5077 |
0.5441 | 3030 | 2.3881 | - |
0.5459 | 3040 | 2.3398 | - |
0.5477 | 3050 | 2.2963 | - |
0.5495 | 3060 | 2.3344 | - |
0.5513 | 3070 | 2.3268 | - |
0.5531 | 3080 | 2.3197 | 0.5025 |
0.5549 | 3090 | 2.3667 | - |
0.5567 | 3100 | 2.3655 | - |
0.5584 | 3110 | 2.3295 | - |
0.5602 | 3120 | 2.3238 | - |
0.5620 | 3130 | 2.3336 | - |
0.5631 | 3136 | - | 0.4885 |
0.5638 | 3140 | 2.3408 | - |
0.5656 | 3150 | 2.3371 | - |
0.5674 | 3160 | 2.3419 | - |
0.5692 | 3170 | 2.2884 | - |
0.5710 | 3180 | 2.2972 | - |
0.5728 | 3190 | 2.2571 | - |
0.5732 | 3192 | - | 0.4772 |
0.5746 | 3200 | 2.2741 | - |
0.5764 | 3210 | 2.3012 | - |
0.5782 | 3220 | 2.3374 | - |
0.5800 | 3230 | 2.2804 | - |
0.5818 | 3240 | 2.2674 | - |
0.5832 | 3248 | - | 0.5104 |
0.5836 | 3250 | 2.277 | - |
0.5854 | 3260 | 2.288 | - |
0.5872 | 3270 | 2.2677 | - |
0.5890 | 3280 | 2.2935 | - |
0.5908 | 3290 | 2.2697 | - |
0.5926 | 3300 | 2.2595 | - |
0.5933 | 3304 | - | 0.4893 |
0.5944 | 3310 | 2.2754 | - |
0.5962 | 3320 | 2.2544 | - |
0.5980 | 3330 | 2.2816 | - |
0.5997 | 3340 | 2.2192 | - |
0.6015 | 3350 | 2.2841 | - |
0.6033 | 3360 | 2.2807 | 0.4862 |
0.6051 | 3370 | 2.2228 | - |
0.6069 | 3380 | 2.2437 | - |
0.6087 | 3390 | 2.2494 | - |
0.6105 | 3400 | 2.2715 | - |
0.6123 | 3410 | 2.2578 | - |
0.6134 | 3416 | - | 0.4820 |
0.6141 | 3420 | 2.2393 | - |
0.6159 | 3430 | 2.272 | - |
0.6177 | 3440 | 2.24 | - |
0.6195 | 3450 | 2.2612 | - |
0.6213 | 3460 | 2.2369 | - |
0.6231 | 3470 | 2.251 | - |
0.6235 | 3472 | - | 0.4637 |
0.6249 | 3480 | 2.1808 | - |
0.6267 | 3490 | 2.2178 | - |
0.6285 | 3500 | 2.2261 | - |
0.6303 | 3510 | 2.1946 | - |
0.6321 | 3520 | 2.167 | - |
0.6335 | 3528 | - | 0.4657 |
0.6339 | 3530 | 2.1794 | - |
0.6357 | 3540 | 2.1646 | - |
0.6375 | 3550 | 2.2539 | - |
0.6393 | 3560 | 2.2163 | - |
0.6410 | 3570 | 2.2402 | - |
0.6428 | 3580 | 2.1637 | - |
0.6436 | 3584 | - | 0.4676 |
0.6446 | 3590 | 2.1718 | - |
0.6464 | 3600 | 2.1778 | - |
0.6482 | 3610 | 2.2156 | - |
0.6500 | 3620 | 2.2267 | - |
0.6518 | 3630 | 2.2506 | - |
0.6536 | 3640 | 2.1913 | 0.4698 |
0.6554 | 3650 | 2.2207 | - |
0.6572 | 3660 | 2.1914 | - |
0.6590 | 3670 | 2.2358 | - |
0.6608 | 3680 | 2.213 | - |
0.6626 | 3690 | 2.2178 | - |
0.6637 | 3696 | - | 0.4671 |
0.6644 | 3700 | 2.2003 | - |
0.6662 | 3710 | 2.1846 | - |
0.6680 | 3720 | 2.2418 | - |
0.6698 | 3730 | 2.1752 | - |
0.6716 | 3740 | 2.2026 | - |
0.6734 | 3750 | 2.2094 | - |
0.6737 | 3752 | - | 0.4506 |
0.6752 | 3760 | 2.198 | - |
0.6770 | 3770 | 2.1714 | - |
0.6788 | 3780 | 2.2162 | - |
0.6806 | 3790 | 2.1964 | - |
0.6823 | 3800 | 2.1827 | - |
0.6838 | 3808 | - | 0.4698 |
0.6841 | 3810 | 2.1884 | - |
0.6859 | 3820 | 2.1562 | - |
0.6877 | 3830 | 2.1502 | - |
0.6895 | 3840 | 2.1936 | - |
0.6913 | 3850 | 2.1785 | - |
0.6931 | 3860 | 2.1587 | - |
0.6938 | 3864 | - | 0.4495 |
0.6949 | 3870 | 2.196 | - |
0.6967 | 3880 | 2.1883 | - |
0.6985 | 3890 | 2.1452 | - |
0.7003 | 3900 | 2.1749 | - |
0.7021 | 3910 | 2.219 | - |
0.7039 | 3920 | 2.1916 | 0.4399 |
0.7057 | 3930 | 2.1197 | - |
0.7075 | 3940 | 2.1504 | - |
0.7093 | 3950 | 2.1144 | - |
0.7111 | 3960 | 2.1299 | - |
0.7129 | 3970 | 2.1704 | - |
0.7140 | 3976 | - | 0.4442 |
0.7147 | 3980 | 2.1874 | - |
0.7165 | 3990 | 2.1853 | - |
0.7183 | 4000 | 2.1954 | - |
0.7201 | 4010 | 2.1971 | - |
0.7219 | 4020 | 2.1675 | - |
0.7236 | 4030 | 2.1777 | - |
0.7240 | 4032 | - | 0.4404 |
0.7254 | 4040 | 2.1521 | - |
0.7272 | 4050 | 2.1615 | - |
0.7290 | 4060 | 2.1736 | - |
0.7308 | 4070 | 2.1394 | - |
0.7326 | 4080 | 2.1352 | - |
0.7341 | 4088 | - | 0.4352 |
0.7344 | 4090 | 2.1618 | - |
0.7362 | 4100 | 2.1351 | - |
0.7380 | 4110 | 2.1216 | - |
0.7398 | 4120 | 2.0994 | - |
0.7416 | 4130 | 2.1209 | - |
0.7434 | 4140 | 2.1436 | - |
0.7441 | 4144 | - | 0.4337 |
0.7452 | 4150 | 2.1139 | - |
0.7470 | 4160 | 2.119 | - |
0.7488 | 4170 | 2.1159 | - |
0.7506 | 4180 | 2.1019 | - |
0.7524 | 4190 | 2.1614 | - |
0.7542 | 4200 | 2.1301 | 0.4413 |
0.7560 | 4210 | 2.1316 | - |
0.7578 | 4220 | 2.1273 | - |
0.7596 | 4230 | 2.0352 | - |
0.7614 | 4240 | 2.0996 | - |
0.7632 | 4250 | 2.1295 | - |
0.7642 | 4256 | - | 0.4348 |
0.7649 | 4260 | 2.0968 | - |
0.7667 | 4270 | 2.0778 | - |
0.7685 | 4280 | 2.1248 | - |
0.7703 | 4290 | 2.0838 | - |
0.7721 | 4300 | 2.0912 | - |
0.7739 | 4310 | 2.0775 | - |
0.7743 | 4312 | - | 0.4441 |
0.7757 | 4320 | 2.1257 | - |
0.7775 | 4330 | 2.1134 | - |
0.7793 | 4340 | 2.0975 | - |
0.7811 | 4350 | 2.1004 | - |
0.7829 | 4360 | 2.1172 | - |
0.7843 | 4368 | - | 0.4406 |
0.7847 | 4370 | 2.0906 | - |
0.7865 | 4380 | 2.0822 | - |
0.7883 | 4390 | 2.0881 | - |
0.7901 | 4400 | 2.1305 | - |
0.7919 | 4410 | 2.1207 | - |
0.7937 | 4420 | 2.0894 | - |
0.7944 | 4424 | - | 0.4353 |
0.7955 | 4430 | 2.1046 | - |
0.7973 | 4440 | 2.1255 | - |
0.7991 | 4450 | 2.1023 | - |
0.8009 | 4460 | 2.0824 | - |
0.8027 | 4470 | 2.0778 | - |
0.8045 | 4480 | 2.1155 | 0.4315 |
0.8062 | 4490 | 2.0992 | - |
0.8080 | 4500 | 2.0829 | - |
0.8098 | 4510 | 2.1144 | - |
0.8116 | 4520 | 2.0977 | - |
0.8134 | 4530 | 2.1148 | - |
0.8145 | 4536 | - | 0.4289 |
0.8152 | 4540 | 2.1267 | - |
0.8170 | 4550 | 2.106 | - |
0.8188 | 4560 | 2.0573 | - |
0.8206 | 4570 | 2.0376 | - |
0.8224 | 4580 | 2.1084 | - |
0.8242 | 4590 | 2.0774 | - |
0.8246 | 4592 | - | 0.4270 |
0.8260 | 4600 | 2.1035 | - |
0.8278 | 4610 | 2.1295 | - |
0.8296 | 4620 | 2.1035 | - |
0.8314 | 4630 | 2.118 | - |
0.8332 | 4640 | 2.0951 | - |
0.8346 | 4648 | - | 0.4235 |
0.8350 | 4650 | 2.092 | - |
0.8368 | 4660 | 2.1229 | - |
0.8386 | 4670 | 2.1432 | - |
0.8404 | 4680 | 2.1285 | - |
0.8422 | 4690 | 2.1056 | - |
0.8440 | 4700 | 2.0699 | - |
0.8447 | 4704 | - | 0.4192 |
0.8458 | 4710 | 2.0441 | - |
0.8475 | 4720 | 2.0788 | - |
0.8493 | 4730 | 2.0375 | - |
0.8511 | 4740 | 2.0502 | - |
0.8529 | 4750 | 2.1166 | - |
0.8547 | 4760 | 2.0791 | 0.4196 |
0.8565 | 4770 | 2.0894 | - |
0.8583 | 4780 | 2.1094 | - |
0.8601 | 4790 | 2.0677 | - |
0.8619 | 4800 | 2.0168 | - |
0.8637 | 4810 | 1.972 | - |
0.8648 | 4816 | - | 0.4219 |
0.8655 | 4820 | 2.0104 | - |
0.8673 | 4830 | 1.7668 | - |
0.8691 | 4840 | 1.6258 | - |
0.8709 | 4850 | 1.779 | - |
0.8727 | 4860 | 1.7525 | - |
0.8745 | 4870 | 1.9056 | - |
0.8748 | 4872 | - | 0.4488 |
0.8763 | 4880 | 1.8157 | - |
0.8781 | 4890 | 1.893 | - |
0.8799 | 4900 | 1.9266 | - |
0.8817 | 4910 | 1.8851 | - |
0.8835 | 4920 | 1.9342 | - |
0.8849 | 4928 | - | 0.5160 |
0.8853 | 4930 | 1.8377 | - |
0.8871 | 4940 | 1.873 | - |
0.8888 | 4950 | 1.8302 | - |
0.8906 | 4960 | 1.9123 | - |
0.8924 | 4970 | 1.8605 | - |
0.8942 | 4980 | 1.878 | - |
0.8950 | 4984 | - | 0.5648 |
0.8960 | 4990 | 1.8416 | - |
0.8978 | 5000 | 1.9061 | - |
0.8996 | 5010 | 1.8084 | - |
0.9014 | 5020 | 1.8982 | - |
0.9032 | 5030 | 1.9167 | - |
0.9050 | 5040 | 1.8795 | 0.6082 |
0.9068 | 5050 | 1.9449 | - |
0.9086 | 5060 | 1.956 | - |
0.9104 | 5070 | 1.8469 | - |
0.9122 | 5080 | 1.8858 | - |
0.9140 | 5090 | 1.8 | - |
0.9151 | 5096 | - | 0.6452 |
0.9158 | 5100 | 1.7873 | - |
0.9176 | 5110 | 1.7998 | - |
0.9194 | 5120 | 1.9032 | - |
0.9212 | 5130 | 1.8753 | - |
0.9230 | 5140 | 1.8959 | - |
0.9248 | 5150 | 1.7677 | - |
0.9251 | 5152 | - | 0.6645 |
0.9266 | 5160 | 1.8726 | - |
0.9284 | 5170 | 1.8311 | - |
0.9301 | 5180 | 1.8198 | - |
0.9319 | 5190 | 1.8422 | - |
0.9337 | 5200 | 1.8419 | - |
0.9352 | 5208 | - | 0.6856 |
0.9355 | 5210 | 1.7987 | - |
0.9373 | 5220 | 1.8164 | - |
0.9391 | 5230 | 1.7429 | - |
0.9409 | 5240 | 1.8444 | - |
0.9427 | 5250 | 1.8373 | - |
0.9445 | 5260 | 1.7414 | - |
0.9452 | 5264 | - | 0.7004 |
0.9463 | 5270 | 1.8996 | - |
0.9481 | 5280 | 1.821 | - |
0.9499 | 5290 | 1.8124 | - |
0.9517 | 5300 | 1.7433 | - |
0.9535 | 5310 | 1.8208 | - |
0.9553 | 5320 | 1.826 | 0.7103 |
0.9571 | 5330 | 1.8108 | - |
0.9589 | 5340 | 1.8068 | - |
0.9607 | 5350 | 1.8513 | - |
0.9625 | 5360 | 1.8312 | - |
0.9643 | 5370 | 1.8248 | - |
0.9653 | 5376 | - | 0.7145 |
0.9661 | 5380 | 1.8556 | - |
0.9679 | 5390 | 1.8554 | - |
0.9697 | 5400 | 1.7885 | - |
0.9714 | 5410 | 1.7767 | - |
0.9732 | 5420 | 1.8356 | - |
0.9750 | 5430 | 1.7998 | - |
0.9754 | 5432 | - | 0.7178 |
0.9768 | 5440 | 1.8958 | - |
0.9786 | 5450 | 1.8307 | - |
0.9804 | 5460 | 1.7892 | - |
0.9822 | 5470 | 1.823 | - |
0.9840 | 5480 | 1.8135 | - |
0.9855 | 5488 | - | 0.7201 |
0.9858 | 5490 | 1.7887 | - |
0.9876 | 5500 | 1.8096 | - |
0.9894 | 5510 | 1.8686 | - |
0.9912 | 5520 | 1.8398 | - |
0.9930 | 5530 | 1.9189 | - |
0.9948 | 5540 | 1.689 | - |
0.9955 | 5544 | - | 0.7204 |
0.9966 | 5550 | 1.8621 | - |
0.9984 | 5560 | 1.8037 | - |
Framework Versions
- Python: 3.11.8
- Sentence Transformers: 3.1.1
- Transformers: 4.45.1
- PyTorch: 2.4.0+cu121
- Accelerate: 0.34.2
- Datasets: 3.0.0
- Tokenizers: 0.20.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MaskedCachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Downloads last month
- 25
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for hanwenzhu/all-distilroberta-v1-lr2e-4-bs1024-nneg3-ml-feb22
Base model
sentence-transformers/all-distilroberta-v1