Static Embeddings with BERT Multilingual uncased tokenizer finetuned on various datasets
This is a sentence-transformers model trained on the wikititles, tatoeba, talks, europarl, global_voices, muse, wikimatrix, opensubtitles, stackexchange, quora, wikianswers_duplicates, all_nli, simple_wiki, altlex, flickr30k_captions, coco_captions, nli_for_simcse and negation datasets. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, paraphrase mining, text classification, clustering, and more.
Read our Static Embeddings blogpost to learn more about this model and how it was trained.
- 0 Active Parameters: This model does not use any active parameters, instead consisting exclusively of averaging pre-computed token embeddings.
- 100x to 400x faster: On CPU, this model is 100x to 400x faster than common options like multilingual-e5-small. On GPU, it's 10x to 25x faster.
- Matryoshka: This model was trained with a Matryoshka loss, allowing you to truncate the embeddings for faster retrieval at minimal performance costs.
- Evaluations: See Evaluations for details on performance on NanoBEIR, embedding speed, and Matryoshka dimensionality truncation.
- Training Script: See train.py for the training script used to train this model from scratch.
See static-retrieval-mrl-en-v1
for an English static embedding model that has been finetuned specifically for retrieval tasks.
Model Details
Model Description
- Model Type: Sentence Transformer
- Maximum Sequence Length: inf tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
- Training Datasets:
- Languages: en, multilingual, ar, bg, ca, cs, da, de, el, es, et, fa, fi, fr, gl, gu, he, hi, hu, hy, id, it, ja, ka, ko, ku, lt, lv, mk, mn, mr, ms, my, nb, nl, pl, pt, ro, ru, sk, sl, sq, sr, sv, th, tr, uk, ur, vi, zh, hr
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): StaticEmbedding(
(embedding): EmbeddingBag(105879, 1024, mode='mean')
)
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/static-similarity-mrl-multilingual-v1")
# Run inference
sentences = [
'It is known for its dry red chili powder .',
'It is popular for dry red chili powder .',
'These monsters will move in large groups .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
This model was trained with Matryoshka loss, allowing this model to be used with lower dimensionalities with minimal performance loss.
Notably, a lower dimensionality allows for much faster downstream tasks, such as clustering or classification. You can specify a lower dimensionality with the truncate_dim
argument when initializing the Sentence Transformer model:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("tomaarsen/static-similarity-mrl-multilingual-v1", truncate_dim=256)
embeddings = model.encode([
"I used to hate him.",
"Раньше я ненавидел его."
])
print(embeddings.shape)
# => (2, 256)
Evaluation
We've evaluated the model on 5 languages which have a lot of benchmarks across various tasks on MTEB.
We want to reiterate that this model is not intended for retrieval use cases. Instead, we evaluate on Semantic Textual Similarity (STS), Classification, and Pair Classification. We compare against the excellent and small multilingual-e5-small model.
Across all measured languages, static-similarity-mrl-multilingual-v1 reaches an average 92.3% for STS, 95.52% for Pair Classification, and 86.52% for Classification relative to multilingual-e5-small.
To make up for this performance reduction, static-similarity-mrl-multilingual-v1 is approximately ~125x faster on CPU and ~10x faster on GPU devices than multilingual-e5-small. Due to the super-linear nature of attention models, versus the linear nature of static embedding models, the speedup will only grow larger as the number of tokens to encode increases.
Matryoshka Evaluation
Lastly, we experimented with the impacts on English STS on MTEB performance when we did Matryoshka-style dimensionality reduction by truncating the output embeddings to a lower dimensionality.
As you can see, you can easily reduce the dimensionality by 2x or 4x with minor (0.15% or 0.56%) performance hits. If the speed of your downstream task or your storage costs are a bottleneck, this should allow you to alleviate some of those concerns.
Training Details
Training Datasets
wikititles
- Dataset: wikititles at d92a4d2
- Size: 14,700,458 training samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 4 characters
- mean: 18.33 characters
- max: 84 characters
- min: 4 characters
- mean: 17.19 characters
- max: 109 characters
- Samples:
english non_english Le Vintrou
Ле-Вентру
Greening
Begrünung
Warrap
واراب (توضيح)
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
tatoeba
- Dataset: tatoeba at cec1343
- Size: 4,138,956 training samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 5 characters
- mean: 31.59 characters
- max: 196 characters
- min: 6 characters
- mean: 30.95 characters
- max: 161 characters
- Samples:
english non_english I used to hate him.
Раньше я ненавидел его.
It is nothing less than an insult to her.
それはまさに彼女に対する侮辱だ。
I've apologized, so lay off, OK?
謝ったんだから、さっきのはチャラにしてよ。
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
talks
- Dataset: talks at 0c70bc6
- Size: 9,750,031 training samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 5 characters
- mean: 94.41 characters
- max: 493 characters
- min: 4 characters
- mean: 82.49 characters
- max: 452 characters
- Samples:
english non_english (Laughter) EC: But beatbox started here in New York.
(Skratt) EC: Fast beatbox började här i New York.
I did not have enough money to buy food, and so to forget my hunger, I started singing."
食べ物を買うお金もなかった だから 空腹を忘れるために 歌を歌い始めたの」
That is another 25 million barrels a day.
那时还要增加两千五百万桶的原油。
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
europarl
- Dataset: europarl at 11007ec
- Size: 4,990,000 training samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 0 characters
- mean: 147.77 characters
- max: 668 characters
- min: 0 characters
- mean: 153.13 characters
- max: 971 characters
- Samples:
english non_english (SK) I would like to stress three key points in relation to this issue.
(SK) Chtěla bych zdůraznit tři klíčové body, které jsou s tímto tématem spojeny.
Women have a higher recorded rate of unemployment, especially long term unemployment.
Blandt kvinder registreres større arbejdsløshed, især blandt langtidsarbejdsløse.
You will recall that we have occasionally had disagreements over how to interpret Rule 166 of our Rules of Procedure and that certain Members thought that the Presidency was not applying it properly, since it was not giving the floor for points of order that did not refer to the issue that was being debated at that moment.
De husker nok, at vi til tider har været uenige om fortolkningen af artikel 166 i vores forretningsorden, og at nogle af medlemmerne mente, at formanden ikke anvendte den korrekt, eftersom han ikke gav ordet til indlæg til forretningsordenen, når det ikke drejede sig om det spørgsmål, der blev drøftet på det pågældende tidspunkt.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
global_voices
- Dataset: global_voices at 4cc20ad
- Size: 1,099,099 training samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 5 characters
- mean: 115.13 characters
- max: 740 characters
- min: 3 characters
- mean: 119.89 characters
- max: 801 characters
- Samples:
english non_english Generation 9/11: Cristina Balli (USA) from British Council USA on Vimeo.
Генерација 9/11: Кристина Бали (САД) од Британскиот совет САД на Вимео.
Jamaica: Mapping the state of emergency · Global Voices
Jamaica: Mapeando el estado de emergencia
It takes more than courage or bravery to do such a... http://fb.me/12T47y0Ml
Θέλει κάτι παραπάνω από κουράγιο ή ανδρεία για να κάνεις κάτι τέτοιο... http://fb.me/12T47y0Ml
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
muse
- Dataset: muse at 238c077
- Size: 1,368,274 training samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 3 characters
- mean: 7.38 characters
- max: 16 characters
- min: 1 characters
- mean: 7.33 characters
- max: 18 characters
- Samples:
english non_english metro
metrou
suggest
제안
nnw
nno
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
wikimatrix
- Dataset: wikimatrix at 74a4cb1
- Size: 9,688,498 training samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 16 characters
- mean: 124.31 characters
- max: 418 characters
- min: 11 characters
- mean: 129.99 characters
- max: 485 characters
- Samples:
english non_english 3) A set of wikis to support collaboration activities and disseminate information about good practices.
3) Un conjunt de wikis per donar suport a les activitats de col·laboració i difusió d'informació sobre bones pràctiques.
Daily cruiseferry services operate to Copenhagen and Frederikshavn in Denmark, and to Kiel in Germany.
Dịch vụ phà du lịch hàng ngày vận hành tới Copenhagen và Frederikshavn tại Đan Mạch, và tới Kiel tại Đức.
In late April 1943, Philipp was ordered to report to Hitler's headquarters, where he stayed for most of the next four months.
Sent i april 1943 fick Philipp ordern att rapportera till Hitlers högkvarter, där han stannade i fyra månader.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
opensubtitles
- Dataset: opensubtitles at d86a387
- Size: 4,990,000 training samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 0 characters
- mean: 34.43 characters
- max: 220 characters
- min: 0 characters
- mean: 26.99 characters
- max: 118 characters
- Samples:
english non_english Would you send a tomato juice, black coffee and a masseur?
هل لك أن ترسل لي عصير طماطم قهوة سوداء.. والمدلك!
To hear the angels sing
لكى تسمع غناء الملائكه
Brace yourself.
" تمالك نفسك " بريكر
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
stackexchange
- Dataset: stackexchange at 1c9657a
- Size: 250,519 training samples
- Columns:
post1
andpost2
- Approximate statistics based on the first 1000 samples:
post1 post2 type string string details - min: 77 characters
- mean: 669.56 characters
- max: 3982 characters
- min: 81 characters
- mean: 641.44 characters
- max: 4053 characters
- Samples:
post1 post2 New user question about passwords Just got a refurbished computer with Ubuntu as the OS. Have never even heard of the OS and now I'm trying to learn. When I boot the system, it starts up great. But, if I try to navigate around, it requires a password. Is there a trick to finding the initial password? Please advise.
How do I reset a lost administrative password? I'm working on a Ubuntu system, and my client has completely forgotten his administrative password. He doesn't even remember entering one; however it is there. I've tried the suggestions on the website, and I have been unsuccessful in deleting the password so that I can download applets required for running some files. Is there a solution?
Reorder a list of string randomly but constant in a period of time I need to reorder a list in a random way but I want to have the same result on a short period of time ... So I have: var list = new String[] { "Angie", "David", "Emily", "James" } var shuffled = list.OrderBy(v => "4a78926c")).ToList(); But I always get the same order ... I could use Guid.NewGuid() but then I would have a different result in a short period of time. How can I do this?
Randomize a List What is the best way to randomize the order of a generic list in C#? I've got a finite set of 75 numbers in a list I would like to assign a random order to, in order to draw them for a lottery type application.
Made a mistake on check need help to fix I wrote a check and put the amount in the pay to order spot. Can I just mark it out, put the name in the spot and finish writing the check?
How to correct a mistake made when writing a check? I think I know the answer to this, but I'm not sure, and it's a good question, so I'll ask: What is the accepted/proper way to correct a mistake made on a check? For instance, I imagine that in any given January, some people accidentally date a check in the previous year. Is there a way to correct such a mistake, or must a check be voided (and wasted)? Pointers to definitive information (U.S., Canada, and elsewhere) are helpful.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
quora
- Dataset: quora at 451a485
- Size: 101,762 training samples
- Columns:
anchor
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 16 characters
- mean: 53.47 characters
- max: 249 characters
- min: 16 characters
- mean: 52.63 characters
- max: 237 characters
- min: 14 characters
- mean: 54.67 characters
- max: 292 characters
- Samples:
anchor positive negative What food should I try in Brazil?
Which foods should I try in Brazil?
What meat should one eat in Argentina?
What is the best way to get a threesome?
How does one find a threesome?
How is the experience of a threesome?
Whether I do CA or MBA? Which is better?
Which is better CA or MBA?
Which is better CA or IT?
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
wikianswers_duplicates
- Dataset: wikianswers_duplicates at 9af6367
- Size: 9,990,000 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 14 characters
- mean: 47.39 characters
- max: 151 characters
- min: 15 characters
- mean: 47.58 characters
- max: 154 characters
- Samples:
anchor positive Did Democritus belive matter was continess?
Why did democritus call the smallest pice of matter atomos?
Tell you about the most ever done to satisfy a customer?
How do you satisfy your client or customer?
How is a chemical element different from a compound?
How is a chemical element different to a compound?
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
all_nli
- Dataset: all_nli at d482672
- Size: 557,850 training samples
- Columns:
anchor
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 18 characters
- mean: 34.88 characters
- max: 193 characters
- min: 15 characters
- mean: 46.49 characters
- max: 181 characters
- min: 16 characters
- mean: 50.47 characters
- max: 204 characters
- Samples:
anchor positive negative A person on a horse jumps over a broken down airplane.
A person is outdoors, on a horse.
A person is at a diner, ordering an omelette.
Children smiling and waving at camera
There are children present
The kids are frowning
A boy is jumping on skateboard in the middle of a red bridge.
The boy does a skateboarding trick.
The boy skates down the sidewalk.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
simple_wiki
- Dataset: simple_wiki at 60fd9b4
- Size: 102,225 training samples
- Columns:
text
andsimplified
- Approximate statistics based on the first 1000 samples:
text simplified type string string details - min: 18 characters
- mean: 149.3 characters
- max: 573 characters
- min: 16 characters
- mean: 123.58 characters
- max: 576 characters
- Samples:
text simplified The next morning , it had a small CDO and well-defined bands , and the system , either a weak tropical storm or a strong tropical depression , likely reached its peak .
The next morning , it had a small amounts of convection near the center and well-defined bands , and the system , either a weak tropical storm or a strong tropical depression , likely reached its peak .
The region of measurable parameter space that corresponds to a regime is very often loosely defined . Examples include
the superfluid regime '' ,
the steady state regime '' or `` the femtosecond regime '' .This is common if a regime is threatened by another regime .
The Lamborghini Diablo is a high-performance mid-engined sports car that was built by Italian automaker Lamborghini between 1990 and 2001 .
The Lamborghini Diablo is a sport car that was built by Lamborghini from 1990 to 2001 .
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
altlex
- Dataset: altlex at 97eb209
- Size: 112,696 training samples
- Columns:
text
andsimplified
- Approximate statistics based on the first 1000 samples:
text simplified type string string details - min: 13 characters
- mean: 131.03 characters
- max: 492 characters
- min: 13 characters
- mean: 112.41 characters
- max: 492 characters
- Samples:
text simplified Reinforcement and punishment are the core tools of operant conditioning .
Principles of operant conditioning :
The Japanese Ministry of Health , Labour and Welfare defines `` hikikomori '' as people who refuse to leave their house and , thus , isolate themselves from society in their homes for a period exceeding six months .
The Japanese Ministry of Health , Labour and Welfare defines hikikomori as people who refuse to leave their house for over six months .
It has six rows of black spines and has a pair of long , clubbed spines on the head .
It has a pair of long , clubbed spines on the head .
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
flickr30k_captions
- Dataset: flickr30k_captions at 0ef0ce3
- Size: 158,881 training samples
- Columns:
caption1
andcaption2
- Approximate statistics based on the first 1000 samples:
caption1 caption2 type string string details - min: 20 characters
- mean: 63.19 characters
- max: 318 characters
- min: 13 characters
- mean: 63.65 characters
- max: 205 characters
- Samples:
caption1 caption2 Four women pose for a photograph with a man in a bright yellow suit.
A group of friends get their photo taken with a man in a green suit.
A many dressed in army gear walks on the crash walking a brown dog.
A man with army fatigues is walking his dog.
Four people are sitting around a kitchen counter while one is drinking from a glass.
A group of people sit around a breakfast bar.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
coco_captions
- Dataset: coco_captions at bd26018
- Size: 414,010 training samples
- Columns:
caption1
andcaption2
- Approximate statistics based on the first 1000 samples:
caption1 caption2 type string string details - min: 30 characters
- mean: 52.57 characters
- max: 151 characters
- min: 29 characters
- mean: 52.71 characters
- max: 186 characters
- Samples:
caption1 caption2 THERE ARE FRIENDS ON THE BEACH POSING
A group of people standing together on the beach while holding a woman.
a lovely white bathroom with white shower curtain.
A white toilet sitting in a bathroom next to a sink.
Two drinking glass on a counter and a man holding a knife looking at something in front of him.
A restaurant employee standing behind two cups on a counter.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
nli_for_simcse
- Dataset: nli_for_simcse at 926cae4
- Size: 274,951 training samples
- Columns:
anchor
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 11 characters
- mean: 87.69 characters
- max: 483 characters
- min: 7 characters
- mean: 43.85 characters
- max: 244 characters
- min: 7 characters
- mean: 43.87 characters
- max: 172 characters
- Samples:
anchor positive negative A white horse and a rider wearing a ale blue shirt, white pants, and a black helmet are jumping a hurdle.
An equestrian is having a horse jump a hurdle.
A competition is taking place in a kitchen.
A group of people in a dome like building.
A gathering inside a building.
Cats are having a party.
Home to thousands of sheep and a few scattered farming families, the area is characterized by the stark beauty of bare peaks, rugged fells, and the most remote lakes, combined with challenging, narrow roads.
There are no wide and easy roads going through the area.
There are more humans than sheep in the area.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
negation
- Dataset: negation at cd02256
- Size: 10,000 training samples
- Columns:
anchor
,entailment
, andnegative
- Approximate statistics based on the first 1000 samples:
anchor entailment negative type string string string details - min: 9 characters
- mean: 65.84 characters
- max: 275 characters
- min: 7 characters
- mean: 34.06 characters
- max: 167 characters
- min: 9 characters
- mean: 37.26 characters
- max: 166 characters
- Samples:
anchor entailment negative A boy with his hands above his head stands on a cement pillar above the cobblestones.
A boy is standing on a pillar over the cobblestones.
A boy is not standing on a pillar over the cobblestones.
The man works hard in his home office.
home based worker works harder
home based worker does not work harder
Man in black shirt plays silver electric guitar.
A man plays a silver electric guitar.
A man does not play a silver electric guitar.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Evaluation Datasets
wikititles
- Dataset: wikititles at d92a4d2
- Size: 14,700,458 evaluation samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 4 characters
- mean: 18.33 characters
- max: 77 characters
- min: 4 characters
- mean: 17.3 characters
- max: 83 characters
- Samples:
english non_english Bjørvika
比約維卡
Old Mystic, Connecticut
Олд Мистик (Конектикат)
Cystic fibrosis transmembrane conductance regulator
CFTR
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
tatoeba
- Dataset: tatoeba at cec1343
- Size: 4,138,956 evaluation samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 5 characters
- mean: 31.83 characters
- max: 235 characters
- min: 4 characters
- mean: 31.7 characters
- max: 189 characters
- Samples:
english non_english You are not consistent in your actions.
Je bent niet consequent in je handelen.
Neither of them seemed old.
Ninguno de ellos lucía viejo.
Stand up, please.
Устаните, молим Вас.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
talks
- Dataset: talks at 0c70bc6
- Size: 9,750,031 evaluation samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 9 characters
- mean: 94.78 characters
- max: 634 characters
- min: 4 characters
- mean: 84.61 characters
- max: 596 characters
- Samples:
english non_english I'm earthed in my essence, and my self is suspended.
Je suis ancrée, et mon moi est temporairement inexistant.
It's not back on your shoulder.
Dar nu e înapoi pe umăr.
They're usually students who've never seen a desert.
たいていの学生は砂漠を見たこともありません
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
europarl
- Dataset: europarl at 11007ec
- Size: 10,000 evaluation samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 0 characters
- mean: 148.52 characters
- max: 1215 characters
- min: 0 characters
- mean: 154.44 characters
- max: 1316 characters
- Samples:
english non_english Mr Schmidt, Mr Trichet, I absolutely cannot go along with these proposals.
Pane Schmidte, pane Trichete, s těmito návrhy nemohu vůbec souhlasit.
The Council and Parliament recently adopted the regulation on the Single European Sky, one of the provisions of which was Community membership of Eurocontrol, so that Parliament has already indirectly expressed its views on this matter.
Der Rat und das Parlament haben kürzlich die Verordnung über die Schaffung eines einheitlichen europäischen Luftraums verabschiedet, in der unter anderem die Mitgliedschaft der Gemeinschaft bei Eurocontrol festgelegt ist, so dass das Parlament seine Auffassungen hierzu indirekt bereits dargelegt hat.
It was held over from the January part-session until this part-session.
Ihre Behandlung wurde von der Januar-Sitzung auf die jetzige vertagt.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
global_voices
- Dataset: global_voices at 4cc20ad
- Size: 1,099,099 evaluation samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 3 characters
- mean: 115.61 characters
- max: 629 characters
- min: 3 characters
- mean: 121.61 characters
- max: 664 characters
- Samples:
english non_english Haiti: Security vs. Relief? · Global Voices
Haïti : Zones rouges, zones vertes - sécurité contre aide humanitaire ?
In order to prevent weapon smuggling through tunnels, his forces would have fought and killed Palestinians over a sustained period of time.
Con el fin de impedir el contrabando de armas a través de túneles, sus fuerzas habrían combatido y muerto palestinos durante un largo período de tiempo.
Tombstone of Vitalis, an ancient Roman cavalry officer, displayed in front of the Skopje City Museum.
Lápida de Vitalis, un antiguo oficial romano de caballería, exhibida frente al Museo de la Ciudad de Skopje.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
muse
- Dataset: muse at 238c077
- Size: 1,368,274 evaluation samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 3 characters
- mean: 7.5 characters
- max: 17 characters
- min: 1 characters
- mean: 7.39 characters
- max: 16 characters
- Samples:
english non_english generalised
γενικευμένη
language
jazyku
finalised
финализиран
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
wikimatrix
- Dataset: wikimatrix at 74a4cb1
- Size: 9,688,498 evaluation samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 11 characters
- mean: 122.6 characters
- max: 424 characters
- min: 10 characters
- mean: 128.09 characters
- max: 579 characters
- Samples:
english non_english Along with the adjacent waters, it was declared a nature reserve in 2002.
Juntament amb les aigües adjacents, va ser declarada reserva natural el 2002.
Like her husband, Charlotte was a patron of astronomy.
Stejně jako manžel byla Šarlota patronkou astronomie.
Some of the music consists of simple sounds, such as a wind effect heard over the poem "Soon Alaska".
Sommige muziekstukken bevatten eenvoudige geluiden, zoals het geluid van de wind tijdens het gedicht "Soon Alaska".
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
opensubtitles
- Dataset: opensubtitles at d86a387
- Size: 10,000 evaluation samples
- Columns:
english
andnon_english
- Approximate statistics based on the first 1000 samples:
english non_english type string string details - min: 0 characters
- mean: 35.01 characters
- max: 200 characters
- min: 0 characters
- mean: 27.79 characters
- max: 143 characters
- Samples:
english non_english - I don't need my medicine.
-لا أحتاج لدوائي
The Sovereign... Ah.
(الطاغية)!
The other two from your ship.
الإثنان الأخران من سفينتك
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
stackexchange
- Dataset: stackexchange at 1c9657a
- Size: 250,519 evaluation samples
- Columns:
post1
andpost2
- Approximate statistics based on the first 1000 samples:
post1 post2 type string string details - min: 64 characters
- mean: 669.92 characters
- max: 4103 characters
- min: 62 characters
- mean: 644.68 characters
- max: 4121 characters
- Samples:
post1 post2 Find the particular solution for this linear ODE $y' '-2y'+5y=e^x \cos2x$. Find the particular solution for this linear ODE :$y' '-2y'+5y=e^x \cos2x$. How can I use Undetermined coefficients method ?
Particular solution of $y''-4y'+5y = 4e^{2x} (\sin x)$ How do I find the particular solution of this second order inhomogenous differential equation? (Using undetermined coefficients). $y''-4y'+5y = 4e^{2x} (\sin x)$ I can find the generel homogenous solutions but I need help for the particular.
Unbounded sequence has an divergent subsequence Show that if $(x_n)$ is unbounded, then there exists a subsequence $(x_{n_k})$ such that $\lim 1/(x_{n_k}) =0.$ I was thinking that $(x_n)$ is a subsequence of itself. WLOG, suppose $(x_n)$ does not have an upper bound. By Algebraic Limit Theorem, $\lim 1/(x_{n_k}) =0.$ Is there any flaws in my proof?
Given the sequence $(x_n)$ is unbounded, show that there exist a subsequence $(x_{n_k})$ such that $\lim(1/x_n)=0$. Given the sequence $(x_n)$ is unbounded, show that there exist a subsequence $(x_{n_k})$ such that $\lim(1/x_{n_k})=0$. I guess I have to prove that $(x_{n_k})$ diverge, but I don't know how to carry on. Thanks.
"The problem is who can we get to replace her" vs. "The problem is who we can get to replace her" "The problem is who can we get to replace her" vs. "The problem is who we can get to replace her" Which one is correct and why?
Changing subject and verb positions in statements and questions We always change subject and verb positions in whenever we want to ask a question such as "What is your name?". But when it comes to statements like the following, which form is correct? I don't understand what are you talking about. I don't understand what you are talking about. Another example Do you know what time is it? Do you know what time it is? Another example Do you care how do I feel about this? Do you care how I feel about this?
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
quora
- Dataset: quora at 451a485
- Size: 101,762 evaluation samples
- Columns:
anchor
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 15 characters
- mean: 52.48 characters
- max: 164 characters
- min: 12 characters
- mean: 52.86 characters
- max: 162 characters
- min: 12 characters
- mean: 56.18 characters
- max: 298 characters
- Samples:
anchor positive negative Is pornography an art?
Can pornography be art?
Does pornography involve the objectification of women?
How can I improve my speaking in public?
How can I improve my public speaking ability?
How do I improve my vocabulary and English speaking skills? I am a 22 year old software engineer and come from a Telugu medium background. I am able to write well, but my speaking skills are poor.
How do I develop better people skills?
How can I get better people skills?
How do I get better at Minecraft?
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
wikianswers_duplicates
- Dataset: wikianswers_duplicates at 9af6367
- Size: 10,000 evaluation samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 14 characters
- mean: 47.88 characters
- max: 145 characters
- min: 15 characters
- mean: 47.76 characters
- max: 201 characters
- Samples:
anchor positive Can you get pregnant if tubes are clamped?
How long can your fallopian tubes stay clamped?
Is there any object that are triangular prism?
Is a trapezium the same as a triangular prism?
Where is the neutral switch located on a 2000 ford explorer?
Ford f150 1996 safety switch?
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
all_nli
- Dataset: all_nli at d482672
- Size: 6,584 evaluation samples
- Columns:
anchor
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 15 characters
- mean: 72.82 characters
- max: 300 characters
- min: 12 characters
- mean: 34.11 characters
- max: 126 characters
- min: 11 characters
- mean: 36.38 characters
- max: 121 characters
- Samples:
anchor positive negative Two women are embracing while holding to go packages.
Two woman are holding packages.
The men are fighting outside a deli.
Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.
Two kids in numbered jerseys wash their hands.
Two kids in jackets walk to school.
A man selling donuts to a customer during a world exhibition event held in the city of Angeles
A man selling donuts to a customer.
A woman drinks her coffee in a small cafe.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
simple_wiki
- Dataset: simple_wiki at 60fd9b4
- Size: 102,225 evaluation samples
- Columns:
text
andsimplified
- Approximate statistics based on the first 1000 samples:
text simplified type string string details - min: 24 characters
- mean: 147.36 characters
- max: 599 characters
- min: 19 characters
- mean: 124.94 characters
- max: 540 characters
- Samples:
text simplified It marks the southernmost point of the Bahà a de Banderas , upon which the port and resort city of Puerto Vallarta stands .
It is the most southern point of the Bahà a de Banderas .
The interiors of the stations resemble that of the former western Soviet nations , with chandeliers hanging from the corridors .
Its interior resembles that of western former Soviet nations with chandeliers hanging from the corridors .
The Senegal national football team , nicknamed the Lions of Teranga , is the national team of Senegal and is controlled by the Fà dà ration Sà nà galaise de Football .
Senegal national football team is the national football team of Senegal .
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
altlex
- Dataset: altlex at 97eb209
- Size: 112,696 evaluation samples
- Columns:
text
andsimplified
- Approximate statistics based on the first 1000 samples:
text simplified type string string details - min: 9 characters
- mean: 138.99 characters
- max: 592 characters
- min: 7 characters
- mean: 119.43 characters
- max: 517 characters
- Samples:
text simplified 14,000 ) referred to as `` The bush '' within the media .
14,000 ) called `` the bush '' in the media .
The next day he told Elizabeth everything he knew regarding Catherine and her pregnancy .
The next day he told Elizabeth everything .
Alice Ivers and Warren Tubbs had four sons and three daughters together .
Alice Ivers and Warren Tubbs had 4 sons and 3 daughters together .
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
flickr30k_captions
- Dataset: flickr30k_captions at 0ef0ce3
- Size: 158,881 evaluation samples
- Columns:
caption1
andcaption2
- Approximate statistics based on the first 1000 samples:
caption1 caption2 type string string details - min: 12 characters
- mean: 62.95 characters
- max: 279 characters
- min: 15 characters
- mean: 63.34 characters
- max: 206 characters
- Samples:
caption1 caption2 A person wearing sunglasses, a visor, and a British flag is carrying 6 Heineken bottles.
A woman wearing a blue visor is holding 5 bottles of Heineken beer.
Two older people hold hands while walking down a street alley with a group of people.
A group of senior citizens walking down narrow pathway.
View of bicyclists from behind during a race.
A Peloton of bicyclists riding down a road of tightly packed together houses.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
coco_captions
- Dataset: coco_captions at bd26018
- Size: 414,010 evaluation samples
- Columns:
caption1
andcaption2
- Approximate statistics based on the first 1000 samples:
caption1 caption2 type string string details - min: 26 characters
- mean: 51.9 characters
- max: 130 characters
- min: 28 characters
- mean: 52.7 characters
- max: 135 characters
- Samples:
caption1 caption2 A blurry photo of a man next to a refrigerator
The man in black is moving towards a refrigerator.
A young child holding a remote control in it's hand.
A boy holds a remote control up to the camera.
a big airplane that is parked on some concrete
A man standing next to a fighter jet under a cloudy sky.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
nli_for_simcse
- Dataset: nli_for_simcse at 926cae4
- Size: 274,951 evaluation samples
- Columns:
anchor
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 9 characters
- mean: 84.79 characters
- max: 598 characters
- min: 10 characters
- mean: 44.26 characters
- max: 172 characters
- min: 9 characters
- mean: 44.11 characters
- max: 134 characters
- Samples:
anchor positive negative a man waiting for train with a blue coat blue jeans while holing a rope.
A man is waiting for a train.
A man is sitting on a greyhound bus waiting to leave.
Australia's floating dollar has apparently allowed the island continent to sail almost unscathed through the Asian crisis.
Australia has a floating dollar that has made them impervious to the problem in Asia.
Australia has a dollar that is heavily tied to Asia.
A city street in front of a business with a construction worker and road cones.
There is a city street with construction worker and road cones.
There are no cones in front of the city street.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
negation
- Dataset: negation at cd02256
- Size: 10,000 evaluation samples
- Columns:
anchor
,entailment
, andnegative
- Approximate statistics based on the first 1000 samples:
anchor entailment negative type string string string details - min: 26 characters
- mean: 69.49 characters
- max: 229 characters
- min: 15 characters
- mean: 34.88 characters
- max: 89 characters
- min: 16 characters
- mean: 38.68 characters
- max: 87 characters
- Samples:
anchor entailment negative Two men, one standing and one seated on the ground are attempting to wrangle a bull as dust from the action is being kicked up.
Two cowboys attempt to wrangle a bull.
Two cowboys do not attempt to wrangle a bull.
A woman dressed in black is silhouetted against a cloud darkened sky.
A woman in black stands in front of a dark, cloudy backdrop.
A woman in black does not stand in front of a dark, cloudy backdrop.
A kid in a blue shirt playing on a playground.
A kid playing on a playground wearing a blue shirt
A kid not playing on a playground wearing a black shirt
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 1024, 512, 256, 128, 64, 32 ], "matryoshka_weights": [ 1, 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 2048per_device_eval_batch_size
: 2048learning_rate
: 0.2num_train_epochs
: 1warmup_ratio
: 0.1bf16
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 2048per_device_eval_batch_size
: 2048per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 0.2weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | wikititles loss | tatoeba loss | talks loss | europarl loss | global voices loss | muse loss | wikimatrix loss | opensubtitles loss | stackexchange loss | quora loss | wikianswers duplicates loss | all nli loss | simple wiki loss | altlex loss | flickr30k captions loss | coco captions loss | nli for simcse loss | negation loss |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.0000 | 1 | 38.504 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
0.0327 | 1000 | 21.3661 | 15.2607 | 9.1892 | 11.6736 | 1.6431 | 6.6894 | 31.9579 | 3.0122 | 0.3541 | 5.1814 | 2.3756 | 4.9474 | 12.7699 | 0.5687 | 0.8911 | 21.0068 | 17.1302 | 10.8964 | 6.7603 |
0.0654 | 2000 | 9.8377 | 11.7637 | 7.1680 | 8.7697 | 1.6077 | 5.2310 | 27.4887 | 1.8375 | 0.3379 | 5.1107 | 2.2083 | 4.1690 | 12.0384 | 0.4837 | 0.7131 | 20.5401 | 17.8388 | 10.6706 | 7.0488 |
0.0982 | 3000 | 8.5279 | 10.8719 | 6.6160 | 8.3116 | 1.5638 | 4.7298 | 25.8572 | 1.6738 | 0.3152 | 5.1009 | 2.0893 | 3.7332 | 12.0452 | 0.4285 | 0.6519 | 20.2154 | 16.2715 | 10.7693 | 7.3144 |
0.1309 | 4000 | 7.8208 | 10.4614 | 5.4918 | 7.4421 | 1.4420 | 4.0505 | 24.9000 | 1.3462 | 0.2925 | 4.7643 | 2.1143 | 3.7457 | 11.6570 | 0.4390 | 0.6536 | 19.4405 | 16.0912 | 10.7537 | 7.2120 |
0.1636 | 5000 | 7.5347 | 9.5381 | 5.9489 | 7.4027 | 1.4858 | 4.0272 | 23.8335 | 1.2453 | 0.3027 | 3.1262 | 1.9170 | 3.7535 | 11.6186 | 0.4090 | 0.6131 | 18.9329 | 16.1769 | 10.1123 | 7.0750 |
0.1963 | 6000 | 7.1819 | 9.2175 | 5.3231 | 7.0836 | 1.4795 | 3.8328 | 23.1620 | 1.1609 | 0.2964 | 2.7653 | 1.9440 | 3.6610 | 11.2147 | 0.3714 | 0.5853 | 19.0478 | 16.4413 | 9.5790 | 6.8695 |
0.2291 | 7000 | 6.9852 | 9.0344 | 5.5773 | 6.7928 | 1.4409 | 3.9232 | 23.2098 | 1.1750 | 0.2877 | 2.9254 | 1.9411 | 3.5469 | 11.0744 | 0.4254 | 0.6293 | 19.0447 | 16.3774 | 9.5363 | 6.8393 |
0.2618 | 8000 | 6.8114 | 8.9620 | 5.1417 | 6.5466 | 1.4834 | 3.7100 | 22.9815 | 1.0679 | 0.2942 | 2.7687 | 2.0211 | 3.6063 | 11.3424 | 0.4447 | 0.6223 | 19.1836 | 16.5669 | 9.8785 | 6.8528 |
0.2945 | 9000 | 6.5487 | 8.6320 | 4.8710 | 6.5144 | 1.4156 | 3.5712 | 22.9660 | 1.0261 | 0.3051 | 3.0898 | 1.9981 | 3.4305 | 11.1448 | 0.3729 | 0.5814 | 18.8865 | 15.8581 | 9.5213 | 6.7567 |
0.3272 | 10000 | 6.7398 | 8.5630 | 4.7179 | 6.5025 | 1.3931 | 3.5699 | 22.5319 | 0.9916 | 0.2870 | 3.3385 | 1.9580 | 3.5807 | 11.2592 | 0.4155 | 0.6009 | 19.1387 | 16.6836 | 9.6300 | 6.6613 |
0.3599 | 11000 | 6.3915 | 8.4041 | 4.8985 | 6.2787 | 1.4081 | 3.5082 | 22.3204 | 0.9554 | 0.2916 | 2.9365 | 2.0176 | 3.3900 | 11.2956 | 0.3902 | 0.5783 | 18.6448 | 16.1241 | 9.5388 | 6.7295 |
0.3927 | 12000 | 6.5902 | 8.1888 | 4.7326 | 6.1930 | 1.4550 | 3.4999 | 22.1070 | 0.9736 | 0.2935 | 2.9612 | 1.9449 | 3.3281 | 11.0477 | 0.3821 | 0.5696 | 18.3227 | 16.1848 | 9.4772 | 7.0029 |
0.4254 | 13000 | 6.341 | 8.1827 | 4.3838 | 6.1052 | 1.4165 | 3.3944 | 21.9552 | 0.9076 | 0.2991 | 3.2272 | 1.9822 | 3.3494 | 11.1891 | 0.3790 | 0.5600 | 18.4394 | 15.9000 | 9.5644 | 6.9056 |
0.4581 | 14000 | 6.2067 | 8.1549 | 4.4833 | 6.0765 | 1.4055 | 3.3903 | 21.4785 | 0.8962 | 0.2919 | 2.8893 | 1.9540 | 3.3078 | 11.2100 | 0.3569 | 0.5461 | 18.7667 | 16.2978 | 9.2310 | 7.1290 |
0.4908 | 15000 | 6.2237 | 8.0711 | 4.4755 | 6.0087 | 1.3185 | 3.2888 | 21.3689 | 0.8433 | 0.2861 | 3.0129 | 1.9084 | 3.3279 | 11.1236 | 0.3730 | 0.5553 | 18.2711 | 15.7648 | 9.5295 | 7.0092 |
0.5236 | 16000 | 6.1058 | 8.0282 | 4.5076 | 5.8760 | 1.4234 | 3.3046 | 21.3568 | 0.8298 | 0.2826 | 2.8404 | 1.8920 | 3.2918 | 11.1140 | 0.3811 | 0.5550 | 18.2899 | 15.8630 | 9.4807 | 6.7585 |
0.5563 | 17000 | 6.3038 | 7.8679 | 4.4780 | 5.8461 | 1.4016 | 3.2279 | 21.0624 | 0.8205 | 0.2804 | 3.1359 | 1.9066 | 3.3205 | 11.0882 | 0.3913 | 0.5569 | 18.0693 | 15.7346 | 9.2854 | 6.9239 |
0.5890 | 18000 | 5.9824 | 7.7827 | 4.3199 | 5.7441 | 1.3582 | 3.1982 | 21.2444 | 0.8046 | 0.2797 | 2.7466 | 1.8717 | 3.3112 | 11.0553 | 0.3922 | 0.5568 | 18.0357 | 15.6732 | 9.6404 | 6.8331 |
0.6217 | 19000 | 6.0275 | 7.7201 | 4.3591 | 5.8132 | 1.3466 | 3.1888 | 20.9311 | 0.8019 | 0.2765 | 2.7674 | 1.8670 | 3.3082 | 10.9725 | 0.3996 | 0.5560 | 18.6346 | 16.2965 | 9.3774 | 6.9957 |
0.6545 | 20000 | 6.1161 | 7.6429 | 4.2702 | 5.7298 | 1.3670 | 3.1433 | 20.8899 | 0.7871 | 0.2761 | 2.7486 | 1.9230 | 3.2958 | 11.0207 | 0.3516 | 0.5361 | 18.2297 | 15.6363 | 9.6376 | 7.1608 |
0.6872 | 21000 | 5.9608 | 7.5852 | 4.2419 | 5.7760 | 1.3838 | 3.1878 | 20.9966 | 0.7837 | 0.2761 | 2.7098 | 1.8715 | 3.2293 | 10.8935 | 0.3514 | 0.5307 | 18.1424 | 15.5101 | 9.5346 | 7.0668 |
0.7199 | 22000 | 5.7594 | 7.5562 | 4.1123 | 5.6151 | 1.3605 | 3.0954 | 21.0032 | 0.7640 | 0.2769 | 2.6019 | 1.8378 | 3.2377 | 11.0744 | 0.3676 | 0.5431 | 18.2222 | 15.7103 | 9.8826 | 7.2662 |
0.7526 | 23000 | 5.7118 | 7.4714 | 4.0531 | 5.5998 | 1.3546 | 3.0778 | 20.8820 | 0.7518 | 0.2800 | 2.7544 | 1.8756 | 3.2316 | 10.9986 | 0.3571 | 0.5334 | 18.4476 | 15.7161 | 9.6617 | 7.3730 |
0.7853 | 24000 | 5.8024 | 7.4414 | 4.0829 | 5.6335 | 1.3383 | 3.0710 | 20.8217 | 0.7487 | 0.2713 | 2.6091 | 1.8695 | 3.2365 | 10.9929 | 0.3419 | 0.5213 | 18.4064 | 15.7831 | 9.7747 | 7.4290 |
0.8181 | 25000 | 5.8608 | 7.4348 | 4.0571 | 5.5651 | 1.3294 | 3.0518 | 20.6831 | 0.7393 | 0.2784 | 2.6330 | 1.8293 | 3.2197 | 10.9416 | 0.3484 | 0.5213 | 18.6359 | 15.8463 | 9.6883 | 7.4697 |
0.8508 | 26000 | 5.742 | 7.4188 | 3.9483 | 5.4911 | 1.3288 | 3.0402 | 20.7187 | 0.7376 | 0.2772 | 2.6812 | 1.8540 | 3.2415 | 10.9619 | 0.3560 | 0.5323 | 18.6388 | 15.7688 | 9.6707 | 7.3793 |
0.8835 | 27000 | 5.7429 | 7.3956 | 3.9016 | 5.4393 | 1.3277 | 3.0129 | 20.6748 | 0.7314 | 0.2820 | 2.6526 | 1.8798 | 3.1869 | 10.8744 | 0.3435 | 0.5228 | 18.5191 | 15.7264 | 9.5707 | 7.4266 |
0.9162 | 28000 | 5.7825 | 7.3748 | 3.9100 | 5.4261 | 1.3420 | 3.0142 | 20.6013 | 0.7263 | 0.2764 | 2.6708 | 1.8529 | 3.1748 | 10.8951 | 0.3491 | 0.5257 | 18.4914 | 15.5663 | 9.6552 | 7.2807 |
0.9490 | 29000 | 5.5179 | 7.3555 | 3.9046 | 5.3902 | 1.3283 | 2.9882 | 20.5828 | 0.7169 | 0.2732 | 2.6742 | 1.8457 | 3.1760 | 10.9126 | 0.3494 | 0.5246 | 18.5619 | 15.6746 | 9.6539 | 7.3694 |
0.9817 | 30000 | 5.4044 | 7.3390 | 3.8742 | 5.3713 | 1.3127 | 2.9796 | 20.5703 | 0.7120 | 0.2669 | 2.5612 | 1.8536 | 3.1602 | 10.9068 | 0.3464 | 0.5229 | 18.5389 | 15.6788 | 9.5690 | 7.4148 |
1.0000 | 30560 | - | 7.3346 | 3.8728 | 5.3680 | 1.3066 | 2.9780 | 20.5635 | 0.7107 | 0.2672 | 2.5046 | 1.8514 | 3.1596 | 10.9153 | 0.3467 | 0.5233 | 18.5525 | 15.6815 | 9.5687 | 7.4302 |
Environmental Impact
Carbon emissions were measured using CodeCarbon.
- Energy Consumed: 0.506 kWh
- Carbon Emitted: 0.197 kg of CO2
- Hours Used: 3.163 hours
Training Hardware
- On Cloud: No
- GPU Model: 1 x NVIDIA GeForce RTX 3090
- CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
- RAM Size: 31.78 GB
Framework Versions
- Python: 3.11.6
- Sentence Transformers: 3.3.0.dev0
- Transformers: 4.45.2
- PyTorch: 2.5.0+cu121
- Accelerate: 1.0.0
- Datasets: 2.20.0
- Tokenizers: 0.20.1-dev.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}