Static Embeddings with BERT Multilingual uncased tokenizer finetuned on various datasets

This is a sentence-transformers model trained on the wikititles, tatoeba, talks, europarl, global_voices, muse, wikimatrix, opensubtitles, stackexchange, quora, wikianswers_duplicates, all_nli, simple_wiki, altlex, flickr30k_captions, coco_captions, nli_for_simcse and negation datasets. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, paraphrase mining, text classification, clustering, and more.

Read our Static Embeddings blogpost to learn more about this model and how it was trained.

  • 0 Active Parameters: This model does not use any active parameters, instead consisting exclusively of averaging pre-computed token embeddings.
  • 100x to 400x faster: On CPU, this model is 100x to 400x faster than common options like multilingual-e5-small. On GPU, it's 10x to 25x faster.
  • Matryoshka: This model was trained with a Matryoshka loss, allowing you to truncate the embeddings for faster retrieval at minimal performance costs.
  • Evaluations: See Evaluations for details on performance on NanoBEIR, embedding speed, and Matryoshka dimensionality truncation.
  • Training Script: See train.py for the training script used to train this model from scratch.

See static-retrieval-mrl-en-v1 for an English static embedding model that has been finetuned specifically for retrieval tasks.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): StaticEmbedding(
    (embedding): EmbeddingBag(105879, 1024, mode='mean')
  )
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/static-similarity-mrl-multilingual-v1")
# Run inference
sentences = [
    'It is known for its dry red chili powder .',
    'It is popular for dry red chili powder .',
    'These monsters will move in large groups .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

This model was trained with Matryoshka loss, allowing this model to be used with lower dimensionalities with minimal performance loss. Notably, a lower dimensionality allows for much faster downstream tasks, such as clustering or classification. You can specify a lower dimensionality with the truncate_dim argument when initializing the Sentence Transformer model:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("tomaarsen/static-similarity-mrl-multilingual-v1", truncate_dim=256)
embeddings = model.encode([
    "I used to hate him.",
    "Раньше я ненавидел его."
])
print(embeddings.shape)
# => (2, 256)

Evaluation

We've evaluated the model on 5 languages which have a lot of benchmarks across various tasks on MTEB.

We want to reiterate that this model is not intended for retrieval use cases. Instead, we evaluate on Semantic Textual Similarity (STS), Classification, and Pair Classification. We compare against the excellent and small multilingual-e5-small model.

Across all measured languages, static-similarity-mrl-multilingual-v1 reaches an average 92.3% for STS, 95.52% for Pair Classification, and 86.52% for Classification relative to multilingual-e5-small.

To make up for this performance reduction, static-similarity-mrl-multilingual-v1 is approximately ~125x faster on CPU and ~10x faster on GPU devices than multilingual-e5-small. Due to the super-linear nature of attention models, versus the linear nature of static embedding models, the speedup will only grow larger as the number of tokens to encode increases.

Matryoshka Evaluation

Lastly, we experimented with the impacts on English STS on MTEB performance when we did Matryoshka-style dimensionality reduction by truncating the output embeddings to a lower dimensionality.

English STS MTEB performance vs Matryoshka dimensionality reduction

As you can see, you can easily reduce the dimensionality by 2x or 4x with minor (0.15% or 0.56%) performance hits. If the speed of your downstream task or your storage costs are a bottleneck, this should allow you to alleviate some of those concerns.

Training Details

Training Datasets

wikititles
  • Dataset: wikititles at d92a4d2
  • Size: 14,700,458 training samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 4 characters
    • mean: 18.33 characters
    • max: 84 characters
    • min: 4 characters
    • mean: 17.19 characters
    • max: 109 characters
  • Samples:
    english non_english
    Le Vintrou Ле-Вентру
    Greening Begrünung
    Warrap واراب (توضيح)
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
tatoeba
  • Dataset: tatoeba at cec1343
  • Size: 4,138,956 training samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 5 characters
    • mean: 31.59 characters
    • max: 196 characters
    • min: 6 characters
    • mean: 30.95 characters
    • max: 161 characters
  • Samples:
    english non_english
    I used to hate him. Раньше я ненавидел его.
    It is nothing less than an insult to her. それはまさに彼女に対する侮辱だ。
    I've apologized, so lay off, OK? 謝ったんだから、さっきのはチャラにしてよ。
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
talks
  • Dataset: talks at 0c70bc6
  • Size: 9,750,031 training samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 5 characters
    • mean: 94.41 characters
    • max: 493 characters
    • min: 4 characters
    • mean: 82.49 characters
    • max: 452 characters
  • Samples:
    english non_english
    (Laughter) EC: But beatbox started here in New York. (Skratt) EC: Fast beatbox började här i New York.
    I did not have enough money to buy food, and so to forget my hunger, I started singing." 食べ物を買うお金もなかった だから 空腹を忘れるために 歌を歌い始めたの」
    That is another 25 million barrels a day. 那时还要增加两千五百万桶的原油。
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
europarl
  • Dataset: europarl at 11007ec
  • Size: 4,990,000 training samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 0 characters
    • mean: 147.77 characters
    • max: 668 characters
    • min: 0 characters
    • mean: 153.13 characters
    • max: 971 characters
  • Samples:
    english non_english
    (SK) I would like to stress three key points in relation to this issue. (SK) Chtěla bych zdůraznit tři klíčové body, které jsou s tímto tématem spojeny.
    Women have a higher recorded rate of unemployment, especially long term unemployment. Blandt kvinder registreres større arbejdsløshed, især blandt langtidsarbejdsløse.
    You will recall that we have occasionally had disagreements over how to interpret Rule 166 of our Rules of Procedure and that certain Members thought that the Presidency was not applying it properly, since it was not giving the floor for points of order that did not refer to the issue that was being debated at that moment. De husker nok, at vi til tider har været uenige om fortolkningen af artikel 166 i vores forretningsorden, og at nogle af medlemmerne mente, at formanden ikke anvendte den korrekt, eftersom han ikke gav ordet til indlæg til forretningsordenen, når det ikke drejede sig om det spørgsmål, der blev drøftet på det pågældende tidspunkt.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
global_voices
  • Dataset: global_voices at 4cc20ad
  • Size: 1,099,099 training samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 5 characters
    • mean: 115.13 characters
    • max: 740 characters
    • min: 3 characters
    • mean: 119.89 characters
    • max: 801 characters
  • Samples:
    english non_english
    Generation 9/11: Cristina Balli (USA) from British Council USA on Vimeo. Генерација 9/11: Кристина Бали (САД) од Британскиот совет САД на Вимео.
    Jamaica: Mapping the state of emergency · Global Voices Jamaica: Mapeando el estado de emergencia
    It takes more than courage or bravery to do such a... http://fb.me/12T47y0Ml Θέλει κάτι παραπάνω από κουράγιο ή ανδρεία για να κάνεις κάτι τέτοιο... http://fb.me/12T47y0Ml
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
muse
  • Dataset: muse at 238c077
  • Size: 1,368,274 training samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 3 characters
    • mean: 7.38 characters
    • max: 16 characters
    • min: 1 characters
    • mean: 7.33 characters
    • max: 18 characters
  • Samples:
    english non_english
    metro metrou
    suggest 제안
    nnw nno
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
wikimatrix
  • Dataset: wikimatrix at 74a4cb1
  • Size: 9,688,498 training samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 16 characters
    • mean: 124.31 characters
    • max: 418 characters
    • min: 11 characters
    • mean: 129.99 characters
    • max: 485 characters
  • Samples:
    english non_english
    3) A set of wikis to support collaboration activities and disseminate information about good practices. 3) Un conjunt de wikis per donar suport a les activitats de col·laboració i difusió d'informació sobre bones pràctiques.
    Daily cruiseferry services operate to Copenhagen and Frederikshavn in Denmark, and to Kiel in Germany. Dịch vụ phà du lịch hàng ngày vận hành tới Copenhagen và Frederikshavn tại Đan Mạch, và tới Kiel tại Đức.
    In late April 1943, Philipp was ordered to report to Hitler's headquarters, where he stayed for most of the next four months. Sent i april 1943 fick Philipp ordern att rapportera till Hitlers högkvarter, där han stannade i fyra månader.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
opensubtitles
  • Dataset: opensubtitles at d86a387
  • Size: 4,990,000 training samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 0 characters
    • mean: 34.43 characters
    • max: 220 characters
    • min: 0 characters
    • mean: 26.99 characters
    • max: 118 characters
  • Samples:
    english non_english
    Would you send a tomato juice, black coffee and a masseur? هل لك أن ترسل لي عصير طماطم قهوة سوداء.. والمدلك!
    To hear the angels sing لكى تسمع غناء الملائكه
    Brace yourself. " تمالك نفسك " بريكر
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
stackexchange
  • Dataset: stackexchange at 1c9657a
  • Size: 250,519 training samples
  • Columns: post1 and post2
  • Approximate statistics based on the first 1000 samples:
    post1 post2
    type string string
    details
    • min: 77 characters
    • mean: 669.56 characters
    • max: 3982 characters
    • min: 81 characters
    • mean: 641.44 characters
    • max: 4053 characters
  • Samples:
    post1 post2
    New user question about passwords Just got a refurbished computer with Ubuntu as the OS. Have never even heard of the OS and now I'm trying to learn. When I boot the system, it starts up great. But, if I try to navigate around, it requires a password. Is there a trick to finding the initial password? Please advise. How do I reset a lost administrative password? I'm working on a Ubuntu system, and my client has completely forgotten his administrative password. He doesn't even remember entering one; however it is there. I've tried the suggestions on the website, and I have been unsuccessful in deleting the password so that I can download applets required for running some files. Is there a solution?
    Reorder a list of string randomly but constant in a period of time I need to reorder a list in a random way but I want to have the same result on a short period of time ... So I have: var list = new String[] { "Angie", "David", "Emily", "James" } var shuffled = list.OrderBy(v => "4a78926c")).ToList(); But I always get the same order ... I could use Guid.NewGuid() but then I would have a different result in a short period of time. How can I do this? Randomize a List What is the best way to randomize the order of a generic list in C#? I've got a finite set of 75 numbers in a list I would like to assign a random order to, in order to draw them for a lottery type application.
    Made a mistake on check need help to fix I wrote a check and put the amount in the pay to order spot. Can I just mark it out, put the name in the spot and finish writing the check? How to correct a mistake made when writing a check? I think I know the answer to this, but I'm not sure, and it's a good question, so I'll ask: What is the accepted/proper way to correct a mistake made on a check? For instance, I imagine that in any given January, some people accidentally date a check in the previous year. Is there a way to correct such a mistake, or must a check be voided (and wasted)? Pointers to definitive information (U.S., Canada, and elsewhere) are helpful.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
quora
  • Dataset: quora at 451a485
  • Size: 101,762 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 16 characters
    • mean: 53.47 characters
    • max: 249 characters
    • min: 16 characters
    • mean: 52.63 characters
    • max: 237 characters
    • min: 14 characters
    • mean: 54.67 characters
    • max: 292 characters
  • Samples:
    anchor positive negative
    What food should I try in Brazil? Which foods should I try in Brazil? What meat should one eat in Argentina?
    What is the best way to get a threesome? How does one find a threesome? How is the experience of a threesome?
    Whether I do CA or MBA? Which is better? Which is better CA or MBA? Which is better CA or IT?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
wikianswers_duplicates
  • Dataset: wikianswers_duplicates at 9af6367
  • Size: 9,990,000 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 14 characters
    • mean: 47.39 characters
    • max: 151 characters
    • min: 15 characters
    • mean: 47.58 characters
    • max: 154 characters
  • Samples:
    anchor positive
    Did Democritus belive matter was continess? Why did democritus call the smallest pice of matter atomos?
    Tell you about the most ever done to satisfy a customer? How do you satisfy your client or customer?
    How is a chemical element different from a compound? How is a chemical element different to a compound?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
all_nli
  • Dataset: all_nli at d482672
  • Size: 557,850 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 18 characters
    • mean: 34.88 characters
    • max: 193 characters
    • min: 15 characters
    • mean: 46.49 characters
    • max: 181 characters
    • min: 16 characters
    • mean: 50.47 characters
    • max: 204 characters
  • Samples:
    anchor positive negative
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. A person is at a diner, ordering an omelette.
    Children smiling and waving at camera There are children present The kids are frowning
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick. The boy skates down the sidewalk.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
simple_wiki
  • Dataset: simple_wiki at 60fd9b4
  • Size: 102,225 training samples
  • Columns: text and simplified
  • Approximate statistics based on the first 1000 samples:
    text simplified
    type string string
    details
    • min: 18 characters
    • mean: 149.3 characters
    • max: 573 characters
    • min: 16 characters
    • mean: 123.58 characters
    • max: 576 characters
  • Samples:
    text simplified
    The next morning , it had a small CDO and well-defined bands , and the system , either a weak tropical storm or a strong tropical depression , likely reached its peak . The next morning , it had a small amounts of convection near the center and well-defined bands , and the system , either a weak tropical storm or a strong tropical depression , likely reached its peak .
    The region of measurable parameter space that corresponds to a regime is very often loosely defined . Examples include the superfluid regime '' , the steady state regime '' or `` the femtosecond regime '' . This is common if a regime is threatened by another regime .
    The Lamborghini Diablo is a high-performance mid-engined sports car that was built by Italian automaker Lamborghini between 1990 and 2001 . The Lamborghini Diablo is a sport car that was built by Lamborghini from 1990 to 2001 .
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
altlex
  • Dataset: altlex at 97eb209
  • Size: 112,696 training samples
  • Columns: text and simplified
  • Approximate statistics based on the first 1000 samples:
    text simplified
    type string string
    details
    • min: 13 characters
    • mean: 131.03 characters
    • max: 492 characters
    • min: 13 characters
    • mean: 112.41 characters
    • max: 492 characters
  • Samples:
    text simplified
    Reinforcement and punishment are the core tools of operant conditioning . Principles of operant conditioning :
    The Japanese Ministry of Health , Labour and Welfare defines `` hikikomori '' as people who refuse to leave their house and , thus , isolate themselves from society in their homes for a period exceeding six months . The Japanese Ministry of Health , Labour and Welfare defines hikikomori as people who refuse to leave their house for over six months .
    It has six rows of black spines and has a pair of long , clubbed spines on the head . It has a pair of long , clubbed spines on the head .
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

flickr30k_captions

  • Dataset: flickr30k_captions at 0ef0ce3
  • Size: 158,881 training samples
  • Columns: caption1 and caption2
  • Approximate statistics based on the first 1000 samples:
    caption1 caption2
    type string string
    details
    • min: 20 characters
    • mean: 63.19 characters
    • max: 318 characters
    • min: 13 characters
    • mean: 63.65 characters
    • max: 205 characters
  • Samples:
    caption1 caption2
    Four women pose for a photograph with a man in a bright yellow suit. A group of friends get their photo taken with a man in a green suit.
    A many dressed in army gear walks on the crash walking a brown dog. A man with army fatigues is walking his dog.
    Four people are sitting around a kitchen counter while one is drinking from a glass. A group of people sit around a breakfast bar.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
coco_captions
  • Dataset: coco_captions at bd26018
  • Size: 414,010 training samples
  • Columns: caption1 and caption2
  • Approximate statistics based on the first 1000 samples:
    caption1 caption2
    type string string
    details
    • min: 30 characters
    • mean: 52.57 characters
    • max: 151 characters
    • min: 29 characters
    • mean: 52.71 characters
    • max: 186 characters
  • Samples:
    caption1 caption2
    THERE ARE FRIENDS ON THE BEACH POSING A group of people standing together on the beach while holding a woman.
    a lovely white bathroom with white shower curtain. A white toilet sitting in a bathroom next to a sink.
    Two drinking glass on a counter and a man holding a knife looking at something in front of him. A restaurant employee standing behind two cups on a counter.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
nli_for_simcse
  • Dataset: nli_for_simcse at 926cae4
  • Size: 274,951 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 11 characters
    • mean: 87.69 characters
    • max: 483 characters
    • min: 7 characters
    • mean: 43.85 characters
    • max: 244 characters
    • min: 7 characters
    • mean: 43.87 characters
    • max: 172 characters
  • Samples:
    anchor positive negative
    A white horse and a rider wearing a ale blue shirt, white pants, and a black helmet are jumping a hurdle. An equestrian is having a horse jump a hurdle. A competition is taking place in a kitchen.
    A group of people in a dome like building. A gathering inside a building. Cats are having a party.
    Home to thousands of sheep and a few scattered farming families, the area is characterized by the stark beauty of bare peaks, rugged fells, and the most remote lakes, combined with challenging, narrow roads. There are no wide and easy roads going through the area. There are more humans than sheep in the area.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
negation
  • Dataset: negation at cd02256
  • Size: 10,000 training samples
  • Columns: anchor, entailment, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor entailment negative
    type string string string
    details
    • min: 9 characters
    • mean: 65.84 characters
    • max: 275 characters
    • min: 7 characters
    • mean: 34.06 characters
    • max: 167 characters
    • min: 9 characters
    • mean: 37.26 characters
    • max: 166 characters
  • Samples:
    anchor entailment negative
    A boy with his hands above his head stands on a cement pillar above the cobblestones. A boy is standing on a pillar over the cobblestones. A boy is not standing on a pillar over the cobblestones.
    The man works hard in his home office. home based worker works harder home based worker does not work harder
    Man in black shirt plays silver electric guitar. A man plays a silver electric guitar. A man does not play a silver electric guitar.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Datasets

wikititles
  • Dataset: wikititles at d92a4d2
  • Size: 14,700,458 evaluation samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 4 characters
    • mean: 18.33 characters
    • max: 77 characters
    • min: 4 characters
    • mean: 17.3 characters
    • max: 83 characters
  • Samples:
    english non_english
    Bjørvika 比約維卡
    Old Mystic, Connecticut Олд Мистик (Конектикат)
    Cystic fibrosis transmembrane conductance regulator CFTR
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
tatoeba
  • Dataset: tatoeba at cec1343
  • Size: 4,138,956 evaluation samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 5 characters
    • mean: 31.83 characters
    • max: 235 characters
    • min: 4 characters
    • mean: 31.7 characters
    • max: 189 characters
  • Samples:
    english non_english
    You are not consistent in your actions. Je bent niet consequent in je handelen.
    Neither of them seemed old. Ninguno de ellos lucía viejo.
    Stand up, please. Устаните, молим Вас.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
talks
  • Dataset: talks at 0c70bc6
  • Size: 9,750,031 evaluation samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 9 characters
    • mean: 94.78 characters
    • max: 634 characters
    • min: 4 characters
    • mean: 84.61 characters
    • max: 596 characters
  • Samples:
    english non_english
    I'm earthed in my essence, and my self is suspended. Je suis ancrée, et mon moi est temporairement inexistant.
    It's not back on your shoulder. Dar nu e înapoi pe umăr.
    They're usually students who've never seen a desert. たいていの学生は砂漠を見たこともありません
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
europarl
  • Dataset: europarl at 11007ec
  • Size: 10,000 evaluation samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 0 characters
    • mean: 148.52 characters
    • max: 1215 characters
    • min: 0 characters
    • mean: 154.44 characters
    • max: 1316 characters
  • Samples:
    english non_english
    Mr Schmidt, Mr Trichet, I absolutely cannot go along with these proposals. Pane Schmidte, pane Trichete, s těmito návrhy nemohu vůbec souhlasit.
    The Council and Parliament recently adopted the regulation on the Single European Sky, one of the provisions of which was Community membership of Eurocontrol, so that Parliament has already indirectly expressed its views on this matter. Der Rat und das Parlament haben kürzlich die Verordnung über die Schaffung eines einheitlichen europäischen Luftraums verabschiedet, in der unter anderem die Mitgliedschaft der Gemeinschaft bei Eurocontrol festgelegt ist, so dass das Parlament seine Auffassungen hierzu indirekt bereits dargelegt hat.
    It was held over from the January part-session until this part-session. Ihre Behandlung wurde von der Januar-Sitzung auf die jetzige vertagt.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
global_voices
  • Dataset: global_voices at 4cc20ad
  • Size: 1,099,099 evaluation samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 3 characters
    • mean: 115.61 characters
    • max: 629 characters
    • min: 3 characters
    • mean: 121.61 characters
    • max: 664 characters
  • Samples:
    english non_english
    Haiti: Security vs. Relief? · Global Voices Haïti : Zones rouges, zones vertes - sécurité contre aide humanitaire ?
    In order to prevent weapon smuggling through tunnels, his forces would have fought and killed Palestinians over a sustained period of time. Con el fin de impedir el contrabando de armas a través de túneles, sus fuerzas habrían combatido y muerto palestinos durante un largo período de tiempo.
    Tombstone of Vitalis, an ancient Roman cavalry officer, displayed in front of the Skopje City Museum. Lápida de Vitalis, un antiguo oficial romano de caballería, exhibida frente al Museo de la Ciudad de Skopje.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
muse
  • Dataset: muse at 238c077
  • Size: 1,368,274 evaluation samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 3 characters
    • mean: 7.5 characters
    • max: 17 characters
    • min: 1 characters
    • mean: 7.39 characters
    • max: 16 characters
  • Samples:
    english non_english
    generalised γενικευμένη
    language jazyku
    finalised финализиран
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
wikimatrix
  • Dataset: wikimatrix at 74a4cb1
  • Size: 9,688,498 evaluation samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 11 characters
    • mean: 122.6 characters
    • max: 424 characters
    • min: 10 characters
    • mean: 128.09 characters
    • max: 579 characters
  • Samples:
    english non_english
    Along with the adjacent waters, it was declared a nature reserve in 2002. Juntament amb les aigües adjacents, va ser declarada reserva natural el 2002.
    Like her husband, Charlotte was a patron of astronomy. Stejně jako manžel byla Šarlota patronkou astronomie.
    Some of the music consists of simple sounds, such as a wind effect heard over the poem "Soon Alaska". Sommige muziekstukken bevatten eenvoudige geluiden, zoals het geluid van de wind tijdens het gedicht "Soon Alaska".
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
opensubtitles
  • Dataset: opensubtitles at d86a387
  • Size: 10,000 evaluation samples
  • Columns: english and non_english
  • Approximate statistics based on the first 1000 samples:
    english non_english
    type string string
    details
    • min: 0 characters
    • mean: 35.01 characters
    • max: 200 characters
    • min: 0 characters
    • mean: 27.79 characters
    • max: 143 characters
  • Samples:
    english non_english
    - I don't need my medicine. -لا أحتاج لدوائي
    The Sovereign... Ah. (الطاغية)!
    The other two from your ship. الإثنان الأخران من سفينتك
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
stackexchange
  • Dataset: stackexchange at 1c9657a
  • Size: 250,519 evaluation samples
  • Columns: post1 and post2
  • Approximate statistics based on the first 1000 samples:
    post1 post2
    type string string
    details
    • min: 64 characters
    • mean: 669.92 characters
    • max: 4103 characters
    • min: 62 characters
    • mean: 644.68 characters
    • max: 4121 characters
  • Samples:
    post1 post2
    Find the particular solution for this linear ODE $y' '-2y'+5y=e^x \cos2x$. Find the particular solution for this linear ODE :$y' '-2y'+5y=e^x \cos2x$. How can I use Undetermined coefficients method ? Particular solution of $y''-4y'+5y = 4e^{2x} (\sin x)$ How do I find the particular solution of this second order inhomogenous differential equation? (Using undetermined coefficients). $y''-4y'+5y = 4e^{2x} (\sin x)$ I can find the generel homogenous solutions but I need help for the particular.
    Unbounded sequence has an divergent subsequence Show that if $(x_n)$ is unbounded, then there exists a subsequence $(x_{n_k})$ such that $\lim 1/(x_{n_k}) =0.$ I was thinking that $(x_n)$ is a subsequence of itself. WLOG, suppose $(x_n)$ does not have an upper bound. By Algebraic Limit Theorem, $\lim 1/(x_{n_k}) =0.$ Is there any flaws in my proof? Given the sequence $(x_n)$ is unbounded, show that there exist a subsequence $(x_{n_k})$ such that $\lim(1/x_n)=0$. Given the sequence $(x_n)$ is unbounded, show that there exist a subsequence $(x_{n_k})$ such that $\lim(1/x_{n_k})=0$. I guess I have to prove that $(x_{n_k})$ diverge, but I don't know how to carry on. Thanks.
    "The problem is who can we get to replace her" vs. "The problem is who we can get to replace her" "The problem is who can we get to replace her" vs. "The problem is who we can get to replace her" Which one is correct and why? Changing subject and verb positions in statements and questions We always change subject and verb positions in whenever we want to ask a question such as "What is your name?". But when it comes to statements like the following, which form is correct? I don't understand what are you talking about. I don't understand what you are talking about. Another example Do you know what time is it? Do you know what time it is? Another example Do you care how do I feel about this? Do you care how I feel about this?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
quora
  • Dataset: quora at 451a485
  • Size: 101,762 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 15 characters
    • mean: 52.48 characters
    • max: 164 characters
    • min: 12 characters
    • mean: 52.86 characters
    • max: 162 characters
    • min: 12 characters
    • mean: 56.18 characters
    • max: 298 characters
  • Samples:
    anchor positive negative
    Is pornography an art? Can pornography be art? Does pornography involve the objectification of women?
    How can I improve my speaking in public? How can I improve my public speaking ability? How do I improve my vocabulary and English speaking skills? I am a 22 year old software engineer and come from a Telugu medium background. I am able to write well, but my speaking skills are poor.
    How do I develop better people skills? How can I get better people skills? How do I get better at Minecraft?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
wikianswers_duplicates
  • Dataset: wikianswers_duplicates at 9af6367
  • Size: 10,000 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 14 characters
    • mean: 47.88 characters
    • max: 145 characters
    • min: 15 characters
    • mean: 47.76 characters
    • max: 201 characters
  • Samples:
    anchor positive
    Can you get pregnant if tubes are clamped? How long can your fallopian tubes stay clamped?
    Is there any object that are triangular prism? Is a trapezium the same as a triangular prism?
    Where is the neutral switch located on a 2000 ford explorer? Ford f150 1996 safety switch?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
all_nli
  • Dataset: all_nli at d482672
  • Size: 6,584 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 15 characters
    • mean: 72.82 characters
    • max: 300 characters
    • min: 12 characters
    • mean: 34.11 characters
    • max: 126 characters
    • min: 11 characters
    • mean: 36.38 characters
    • max: 121 characters
  • Samples:
    anchor positive negative
    Two women are embracing while holding to go packages. Two woman are holding packages. The men are fighting outside a deli.
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. Two kids in numbered jerseys wash their hands. Two kids in jackets walk to school.
    A man selling donuts to a customer during a world exhibition event held in the city of Angeles A man selling donuts to a customer. A woman drinks her coffee in a small cafe.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
simple_wiki
  • Dataset: simple_wiki at 60fd9b4
  • Size: 102,225 evaluation samples
  • Columns: text and simplified
  • Approximate statistics based on the first 1000 samples:
    text simplified
    type string string
    details
    • min: 24 characters
    • mean: 147.36 characters
    • max: 599 characters
    • min: 19 characters
    • mean: 124.94 characters
    • max: 540 characters
  • Samples:
    text simplified
    It marks the southernmost point of the Bahà a de Banderas , upon which the port and resort city of Puerto Vallarta stands . It is the most southern point of the Bahà a de Banderas .
    The interiors of the stations resemble that of the former western Soviet nations , with chandeliers hanging from the corridors . Its interior resembles that of western former Soviet nations with chandeliers hanging from the corridors .
    The Senegal national football team , nicknamed the Lions of Teranga , is the national team of Senegal and is controlled by the Fà dà ration Sà nà galaise de Football . Senegal national football team is the national football team of Senegal .
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
altlex
  • Dataset: altlex at 97eb209
  • Size: 112,696 evaluation samples
  • Columns: text and simplified
  • Approximate statistics based on the first 1000 samples:
    text simplified
    type string string
    details
    • min: 9 characters
    • mean: 138.99 characters
    • max: 592 characters
    • min: 7 characters
    • mean: 119.43 characters
    • max: 517 characters
  • Samples:
    text simplified
    14,000 ) referred to as `` The bush '' within the media . 14,000 ) called `` the bush '' in the media .
    The next day he told Elizabeth everything he knew regarding Catherine and her pregnancy . The next day he told Elizabeth everything .
    Alice Ivers and Warren Tubbs had four sons and three daughters together . Alice Ivers and Warren Tubbs had 4 sons and 3 daughters together .
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
flickr30k_captions
  • Dataset: flickr30k_captions at 0ef0ce3
  • Size: 158,881 evaluation samples
  • Columns: caption1 and caption2
  • Approximate statistics based on the first 1000 samples:
    caption1 caption2
    type string string
    details
    • min: 12 characters
    • mean: 62.95 characters
    • max: 279 characters
    • min: 15 characters
    • mean: 63.34 characters
    • max: 206 characters
  • Samples:
    caption1 caption2
    A person wearing sunglasses, a visor, and a British flag is carrying 6 Heineken bottles. A woman wearing a blue visor is holding 5 bottles of Heineken beer.
    Two older people hold hands while walking down a street alley with a group of people. A group of senior citizens walking down narrow pathway.
    View of bicyclists from behind during a race. A Peloton of bicyclists riding down a road of tightly packed together houses.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
coco_captions
  • Dataset: coco_captions at bd26018
  • Size: 414,010 evaluation samples
  • Columns: caption1 and caption2
  • Approximate statistics based on the first 1000 samples:
    caption1 caption2
    type string string
    details
    • min: 26 characters
    • mean: 51.9 characters
    • max: 130 characters
    • min: 28 characters
    • mean: 52.7 characters
    • max: 135 characters
  • Samples:
    caption1 caption2
    A blurry photo of a man next to a refrigerator The man in black is moving towards a refrigerator.
    A young child holding a remote control in it's hand. A boy holds a remote control up to the camera.
    a big airplane that is parked on some concrete A man standing next to a fighter jet under a cloudy sky.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
nli_for_simcse
  • Dataset: nli_for_simcse at 926cae4
  • Size: 274,951 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 9 characters
    • mean: 84.79 characters
    • max: 598 characters
    • min: 10 characters
    • mean: 44.26 characters
    • max: 172 characters
    • min: 9 characters
    • mean: 44.11 characters
    • max: 134 characters
  • Samples:
    anchor positive negative
    a man waiting for train with a blue coat blue jeans while holing a rope. A man is waiting for a train. A man is sitting on a greyhound bus waiting to leave.
    Australia's floating dollar has apparently allowed the island continent to sail almost unscathed through the Asian crisis. Australia has a floating dollar that has made them impervious to the problem in Asia. Australia has a dollar that is heavily tied to Asia.
    A city street in front of a business with a construction worker and road cones. There is a city street with construction worker and road cones. There are no cones in front of the city street.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    
negation
  • Dataset: negation at cd02256
  • Size: 10,000 evaluation samples
  • Columns: anchor, entailment, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor entailment negative
    type string string string
    details
    • min: 26 characters
    • mean: 69.49 characters
    • max: 229 characters
    • min: 15 characters
    • mean: 34.88 characters
    • max: 89 characters
    • min: 16 characters
    • mean: 38.68 characters
    • max: 87 characters
  • Samples:
    anchor entailment negative
    Two men, one standing and one seated on the ground are attempting to wrangle a bull as dust from the action is being kicked up. Two cowboys attempt to wrangle a bull. Two cowboys do not attempt to wrangle a bull.
    A woman dressed in black is silhouetted against a cloud darkened sky. A woman in black stands in front of a dark, cloudy backdrop. A woman in black does not stand in front of a dark, cloudy backdrop.
    A kid in a blue shirt playing on a playground. A kid playing on a playground wearing a blue shirt A kid not playing on a playground wearing a black shirt
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            512,
            256,
            128,
            64,
            32
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 2048
  • per_device_eval_batch_size: 2048
  • learning_rate: 0.2
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 2048
  • per_device_eval_batch_size: 2048
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.2
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss wikititles loss tatoeba loss talks loss europarl loss global voices loss muse loss wikimatrix loss opensubtitles loss stackexchange loss quora loss wikianswers duplicates loss all nli loss simple wiki loss altlex loss flickr30k captions loss coco captions loss nli for simcse loss negation loss
0.0000 1 38.504 - - - - - - - - - - - - - - - - - -
0.0327 1000 21.3661 15.2607 9.1892 11.6736 1.6431 6.6894 31.9579 3.0122 0.3541 5.1814 2.3756 4.9474 12.7699 0.5687 0.8911 21.0068 17.1302 10.8964 6.7603
0.0654 2000 9.8377 11.7637 7.1680 8.7697 1.6077 5.2310 27.4887 1.8375 0.3379 5.1107 2.2083 4.1690 12.0384 0.4837 0.7131 20.5401 17.8388 10.6706 7.0488
0.0982 3000 8.5279 10.8719 6.6160 8.3116 1.5638 4.7298 25.8572 1.6738 0.3152 5.1009 2.0893 3.7332 12.0452 0.4285 0.6519 20.2154 16.2715 10.7693 7.3144
0.1309 4000 7.8208 10.4614 5.4918 7.4421 1.4420 4.0505 24.9000 1.3462 0.2925 4.7643 2.1143 3.7457 11.6570 0.4390 0.6536 19.4405 16.0912 10.7537 7.2120
0.1636 5000 7.5347 9.5381 5.9489 7.4027 1.4858 4.0272 23.8335 1.2453 0.3027 3.1262 1.9170 3.7535 11.6186 0.4090 0.6131 18.9329 16.1769 10.1123 7.0750
0.1963 6000 7.1819 9.2175 5.3231 7.0836 1.4795 3.8328 23.1620 1.1609 0.2964 2.7653 1.9440 3.6610 11.2147 0.3714 0.5853 19.0478 16.4413 9.5790 6.8695
0.2291 7000 6.9852 9.0344 5.5773 6.7928 1.4409 3.9232 23.2098 1.1750 0.2877 2.9254 1.9411 3.5469 11.0744 0.4254 0.6293 19.0447 16.3774 9.5363 6.8393
0.2618 8000 6.8114 8.9620 5.1417 6.5466 1.4834 3.7100 22.9815 1.0679 0.2942 2.7687 2.0211 3.6063 11.3424 0.4447 0.6223 19.1836 16.5669 9.8785 6.8528
0.2945 9000 6.5487 8.6320 4.8710 6.5144 1.4156 3.5712 22.9660 1.0261 0.3051 3.0898 1.9981 3.4305 11.1448 0.3729 0.5814 18.8865 15.8581 9.5213 6.7567
0.3272 10000 6.7398 8.5630 4.7179 6.5025 1.3931 3.5699 22.5319 0.9916 0.2870 3.3385 1.9580 3.5807 11.2592 0.4155 0.6009 19.1387 16.6836 9.6300 6.6613
0.3599 11000 6.3915 8.4041 4.8985 6.2787 1.4081 3.5082 22.3204 0.9554 0.2916 2.9365 2.0176 3.3900 11.2956 0.3902 0.5783 18.6448 16.1241 9.5388 6.7295
0.3927 12000 6.5902 8.1888 4.7326 6.1930 1.4550 3.4999 22.1070 0.9736 0.2935 2.9612 1.9449 3.3281 11.0477 0.3821 0.5696 18.3227 16.1848 9.4772 7.0029
0.4254 13000 6.341 8.1827 4.3838 6.1052 1.4165 3.3944 21.9552 0.9076 0.2991 3.2272 1.9822 3.3494 11.1891 0.3790 0.5600 18.4394 15.9000 9.5644 6.9056
0.4581 14000 6.2067 8.1549 4.4833 6.0765 1.4055 3.3903 21.4785 0.8962 0.2919 2.8893 1.9540 3.3078 11.2100 0.3569 0.5461 18.7667 16.2978 9.2310 7.1290
0.4908 15000 6.2237 8.0711 4.4755 6.0087 1.3185 3.2888 21.3689 0.8433 0.2861 3.0129 1.9084 3.3279 11.1236 0.3730 0.5553 18.2711 15.7648 9.5295 7.0092
0.5236 16000 6.1058 8.0282 4.5076 5.8760 1.4234 3.3046 21.3568 0.8298 0.2826 2.8404 1.8920 3.2918 11.1140 0.3811 0.5550 18.2899 15.8630 9.4807 6.7585
0.5563 17000 6.3038 7.8679 4.4780 5.8461 1.4016 3.2279 21.0624 0.8205 0.2804 3.1359 1.9066 3.3205 11.0882 0.3913 0.5569 18.0693 15.7346 9.2854 6.9239
0.5890 18000 5.9824 7.7827 4.3199 5.7441 1.3582 3.1982 21.2444 0.8046 0.2797 2.7466 1.8717 3.3112 11.0553 0.3922 0.5568 18.0357 15.6732 9.6404 6.8331
0.6217 19000 6.0275 7.7201 4.3591 5.8132 1.3466 3.1888 20.9311 0.8019 0.2765 2.7674 1.8670 3.3082 10.9725 0.3996 0.5560 18.6346 16.2965 9.3774 6.9957
0.6545 20000 6.1161 7.6429 4.2702 5.7298 1.3670 3.1433 20.8899 0.7871 0.2761 2.7486 1.9230 3.2958 11.0207 0.3516 0.5361 18.2297 15.6363 9.6376 7.1608
0.6872 21000 5.9608 7.5852 4.2419 5.7760 1.3838 3.1878 20.9966 0.7837 0.2761 2.7098 1.8715 3.2293 10.8935 0.3514 0.5307 18.1424 15.5101 9.5346 7.0668
0.7199 22000 5.7594 7.5562 4.1123 5.6151 1.3605 3.0954 21.0032 0.7640 0.2769 2.6019 1.8378 3.2377 11.0744 0.3676 0.5431 18.2222 15.7103 9.8826 7.2662
0.7526 23000 5.7118 7.4714 4.0531 5.5998 1.3546 3.0778 20.8820 0.7518 0.2800 2.7544 1.8756 3.2316 10.9986 0.3571 0.5334 18.4476 15.7161 9.6617 7.3730
0.7853 24000 5.8024 7.4414 4.0829 5.6335 1.3383 3.0710 20.8217 0.7487 0.2713 2.6091 1.8695 3.2365 10.9929 0.3419 0.5213 18.4064 15.7831 9.7747 7.4290
0.8181 25000 5.8608 7.4348 4.0571 5.5651 1.3294 3.0518 20.6831 0.7393 0.2784 2.6330 1.8293 3.2197 10.9416 0.3484 0.5213 18.6359 15.8463 9.6883 7.4697
0.8508 26000 5.742 7.4188 3.9483 5.4911 1.3288 3.0402 20.7187 0.7376 0.2772 2.6812 1.8540 3.2415 10.9619 0.3560 0.5323 18.6388 15.7688 9.6707 7.3793
0.8835 27000 5.7429 7.3956 3.9016 5.4393 1.3277 3.0129 20.6748 0.7314 0.2820 2.6526 1.8798 3.1869 10.8744 0.3435 0.5228 18.5191 15.7264 9.5707 7.4266
0.9162 28000 5.7825 7.3748 3.9100 5.4261 1.3420 3.0142 20.6013 0.7263 0.2764 2.6708 1.8529 3.1748 10.8951 0.3491 0.5257 18.4914 15.5663 9.6552 7.2807
0.9490 29000 5.5179 7.3555 3.9046 5.3902 1.3283 2.9882 20.5828 0.7169 0.2732 2.6742 1.8457 3.1760 10.9126 0.3494 0.5246 18.5619 15.6746 9.6539 7.3694
0.9817 30000 5.4044 7.3390 3.8742 5.3713 1.3127 2.9796 20.5703 0.7120 0.2669 2.5612 1.8536 3.1602 10.9068 0.3464 0.5229 18.5389 15.6788 9.5690 7.4148
1.0000 30560 - 7.3346 3.8728 5.3680 1.3066 2.9780 20.5635 0.7107 0.2672 2.5046 1.8514 3.1596 10.9153 0.3467 0.5233 18.5525 15.6815 9.5687 7.4302

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.506 kWh
  • Carbon Emitted: 0.197 kg of CO2
  • Hours Used: 3.163 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.3.0.dev0
  • Transformers: 4.45.2
  • PyTorch: 2.5.0+cu121
  • Accelerate: 1.0.0
  • Datasets: 2.20.0
  • Tokenizers: 0.20.1-dev.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API

Datasets used to train sentence-transformers/static-similarity-mrl-multilingual-v1