Matryoshka Representation Learning
Paper
•
2205.13147
•
Published
•
25
This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-l. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("kamkol/ab_testing_finetuned_arctic_ft-36dfff22-0696-40d2-b3bf-268fe2ff2aec")
# Run inference
sentences = [
'who jacob cohen say about power analysis?',
'#3: Need over 200K Visitors in Online Experiment • Maybe you need half or double this, but do not trust online controlled experiments with 2,000 users • Let me start with the assumptions required for this default: • Alpha=0.05 to declare stat-sig (industry standard, default #1 in this talk). Lower values, which are appropriate sometimes, will result in increasing sample size • Power=80%. We’ll discuss power over the next few slides, but this is a minimum. Going to higher power will increase the sample size. • Conversion rate is 5%. Even if you’re optimizing for something else, it is very common to build a guardrail on conversions. Sites are typically 2%-5%. A lower number will increase the sample size • MDE (minimum detectable effect) is relative 5%. It is rare to see experiments improve key metrics by 5%, but we’ll be aggressive looking for big wins. A small MDE will increase the sample size #16 When I finally stumbled onto power analysis… it was as if I had died and gone to heaven -- Jacob Cohen (1990) © 2022 Ron Kohavi #3: The Sobering Math • The power formula is simple: • For conversions, 𝜎2 = 𝑝∗(1−𝑝)(binomial distribution). We assume p=5%, so 𝜎2 = 0.05∗0.95 • Our MDE is 5% relative, so 𝛿 = 5%∗5%(absolute) • Plug it in, and n=121,600 per variant, or 243,200 for A/B test • 200K is therefore conservative minimum for these assumptions • Lower alpha, increase power, lower conversion rate, or lower the MDE, and you need a larger sample #17 © 2022 Ron Kohavi #3: What is Power? • Statistical power is the probability of detecting a given difference between the variants when there really is one • Given H0 (left) and H1 (right) separated by 𝛿 • Stat-sig is noted by the dark area (only the right one matters here) • The vertical lines indicate β, which is our type-II error (power= 1-β) • With low power, the right normal moves left and the vertical lines cover most of that distribution. You have to be “lucky” to get stat-sig #18 Diagram from van Belle (2011) © 2022 Ron Kohavi Winner’s Curse • A stat-sig result with low power has a high probability of exaggerating the actual number as follows → (Gelman and Carlin 2014) • GuessTheTest on 16 Dec 2021, shared an example with ~80 users in each variant and 337% improvement • It had 3% power to detect even a 10% delta, so it is 63% likely to be a false positive and highly likely to exaggerate effect. See http://bit.ly/ABTestingIntuitionBusters #19 © 2022 Ron Kohavi A Visualization of Power • If the null hypothesis is true (no difference, or effect is zero), the distribution of p-values is uniform • Using p-value of 0.05, about 5% of the time you’ll declare something stat-sig • We’ll look at the p-value distribution of 10,000 experiments where the treatment has a 5% lift (relative improvement) #20 © 2022 Ron Kohavi Power = 3% (N=100 per variant) Delta of 1 Delta 0Delta of 2 Delta of 3 Delta of 4 With so few users, only a few conversion combinations are possible, hence only a few p-values are possible © 2022 Ron Kohavi Power = 3% (N=100 per variant) cont • With small numbers, you get extreme results – winner’s curse • C had 13 conversions, T had 0 conversions ( -100%, p-value 0.00003) • C had 1 conversion, T had 13 conversions (+1200%, p-value 0.0001) • Average lift (absolute value) for stat-sig results was 271% Remember: true lift is 5%, so exaggeration factor is 54 times for average! • The maximum was 1200% lift, so 240 times the true value. • Wrong sign for stat-sig result: 36% of the time The truth was that there was a 5% lift, but we got a stat-sig negative lift! #22 © 2022 Ron Kohavi Power = ~10% (N=7,000 per variant) #23 Looks almost uniform (remember, our goal is for p-value < 0.05) Stat-sig only ~10% of the time (vs. 5% expected if there was no difference) Winner’s curse: when stat-sig, exaggeration factor of 3.9 (19.3% lift) © 2022 Ron Kohavi Power = ~30% (N=32,000 per variant) #24 Starts to put more mass below 0.05 Winner’s curse: when stat-sig, exaggeration factor of 1.9 (9.3% average lift) Could even get the sign wrong 0.1% of the time when stat-sig © 2022 Ron Kohavi Power = ~80% (N=122,000) #25 That’s the minimum we want: ~80% of the time, p-value < 0.05 Small exaggeration factor of 1.1 (5.6% average lift vs. real value of 5%) Never get the sign wrong when stat-sig © 2022 Ron Kohavi Power = ~90% (N=163,000) #26 Extending experiment from 122K users to 163K users (e.g., 34% longer) gives us great 90% power © 2022 Ron Kohavi References • Excel spreadsheet with the visualizations here (Nov 2022) • Low power LinkedIn post (Sept 2022) • A/B Testing Intuition Busters(Aug 2022) • Gelman, This is what “power = .06” looks like. Get used to it (11/2014) #27 © 2022 Ron Kohavi',
'<1-hop>\n\n3 ESTIMATING THE FALSE POSITIVE RISK P-values are commonly misinterpreted as the probability of making a mistake when choosing the Treatment over Control when the observed metric of interest is statistically significantly different [25; 26; 27]. Multiple examples of this misinterpretation by A/B vendors, book authors, and in courts were provided by Kohavi, Deng, and Vermeer [14]. What is the p-value then? The p-value is the probability of obtaining a result equal to or more extreme than what was observed, assuming that all the modeling assumptions, including the null hypothesis, 𝐻\u0b34, are true [26]. Conditioning1 on the null hypothesis is critical and most often misunderstood. In probabilistic terms, we have p-value = 𝑃(Δ 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑜𝑟 𝑚𝑜𝑟𝑒 𝑒𝑥𝑡𝑟𝑒𝑚𝑒|𝐻\u0b34 𝑖𝑠 𝑡𝑟𝑢𝑒). What we are looking for most of the time is the opposite conditional probability: 𝑃(𝐻\u0b34 𝑖𝑠 𝑡𝑟𝑢𝑒 |Δ 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑) Using Bayes Rule, we can estimate the False Positive Risk (FPR), which is the probability that the statistically significant result is a false positive, or the probability that 𝐻\u0b34 is true (no real effect) when the test was statistically significant [15]. Note that FPR is sometimes named FDR, or False Discovery Rate [28; 29], but given the confusion with FDR from multiple hypothesis testing, we use the term recommended by Colquhoun [15]. We use the following terminology [14]: a) SS is a statistically significant positive2 result. b) 𝜶 is the threshold used to determine statistical significance (SS), commonly 0.05 for a two-tailed t-test, and 0.025 for the positive tail. c) 𝜷 is the type-II error (usually 0.2 for 80% power). d) 𝝅 is the prior probability of the null hypothesis, that is 𝑃(𝐻\u0b34). We can apply Bayes Rules as follows: 𝐹𝑃𝑅 = 𝑃(𝐻\u0b34|𝑆𝑆)= α∗గ α∗ గ ା (ଵିఉ)∗(ଵିగ) . An alternative derivation of FPR, resulting in the same formula, was made in the Supplement to Equation 2 and Figure 2 in Benjamin et. al. [30]. The key parameter required for the above is 𝝅, or 𝑃(𝐻\u0b34). Kohavi, Deng, and Vermeer [14] provided a table with seven success rate estimates (1- 𝝅) that were reported in the software industry, which ranged from 8% to 33% with a median and mode of 10%. Plugging these into the above formula results in an FPR of 22% for the median and mode success rate of 10%, industry standard two- tailed alpha of 0.05 equivalent to one-tailed 0.025, and 80% power (𝜋= 0.9,𝛼= 0.025,𝛽= 0.2). This is a much higher rate than people intuitively think of when they hear statistically significant 2 We refer to a positive result as an improvement in the desired direction, which is usually larger (e.g., conversion, revenue), but may be smaller (e.g., faster time). False Positives in A/B Tests KDD ’24, August 25-29, 2024, Barcelona, Spain improvement. For companies that use 𝛼=0.10 as their threshold for statistical significance, or equivalently use 𝛼=0.05 with a one- tailed test for the improvement tail (e.g. Optimizely [31], Analytics Toolkit [32], Booking.com [33], Expedia), the FPR for 10% success rate is a 36%. Over one third of the statistically significant results showing improvement, which we want to celebrate, are likely to be false positives! To provide intuition about why the FPR is so high when the success rate is low, we will use the data reported by Optimizely [34] of 12% win rate across 127,000 experiments. As we will show later in the paper in Section 4.4, the estimated true success rate is 9.3%, in line with the 10% median and mode of Table 2 in Kohavi, Deng, and Vermeer [14]. Looking at Figure 1, the dot-pattern (also green if viewed in color) in the first row represents the 9.3% success rate, that is, true effects that should be statistically significant given our sample size with 80% power. Of these, 80% will be identified as statistically significant, so 80%*9.3%=7.4% are denoted by a plus in the first row. Of the remaining 90.7% null effects, 5% will be statistically significant and positive, so 4.5% of the A/B tests will show a statistically significant result: a false positive. These are denoted by a plus in the second row. Of the ~12% wins (7.4%+4.5% depicted by pluses), 4.5% are false positives, so 4.5%/(4.5% + 7.4%) = 37.8%. This surprisingly high false positive is often referred to as the base rate fallacy [35].',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.5083 |
| cosine_accuracy@3 | 0.7333 |
| cosine_accuracy@5 | 0.8333 |
| cosine_accuracy@10 | 0.9 |
| cosine_precision@1 | 0.5083 |
| cosine_precision@3 | 0.325 |
| cosine_precision@5 | 0.225 |
| cosine_precision@10 | 0.1292 |
| cosine_recall@1 | 0.3471 |
| cosine_recall@3 | 0.6194 |
| cosine_recall@5 | 0.7114 |
| cosine_recall@10 | 0.8056 |
| cosine_ndcg@10 | 0.6457 |
| cosine_mrr@10 | 0.6401 |
| cosine_map@100 | 0.5788 |
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
How do the pitfalls identified in online A/B testing, such as Simpson’s paradox and misuse of standard statistical formulas, relate to the ongoing debate between Bayesian methods and Frequentist approaches in interpreting A/B test results? |
<7-hop> |
how multiVariable testing (MVT) help speed up testing many factors at once and what experimentation infrastructure requirements make server-side assignment best for running complex MVTs on large sites? |
<2-hop> |
How Figure 4.2 help manage variant assignment and system parameters in experiment platform? |
and its attributes (e.g., country, language, OS, platform), which experiment and variant combinations is that request assigned to? This assignment is based on the experiment specification and a pseudo-random hash of an ID, that is, f(ID). In most cases, to ensure the assignment is consistent for a user, a user ID is used. Variant assignment must also be independent, in that knowing the variant assignment of one user should not tell us anything about variant assignment for a different user. We discuss this in more depth in Chapter 14 . In this chapter, we assume user is the randomization unit. Production code, system parameters and values : Now that you have variant assignment and definitions, how do you ensure that the user receives the appropriate experience: how do you manage different production code and which system parameters should change to what values? This interface (or interfaces) is represented as the Variant Assignment Service in Figure 4.2 , and can return either just the ... |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
1024,
768,
512,
256,
128
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 100multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 100max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss | cosine_ndcg@10 |
|---|---|---|---|
| 1.0 | 27 | - | 0.4217 |
| 1.8519 | 50 | - | 0.5487 |
| 2.0 | 54 | - | 0.5525 |
| 3.0 | 81 | - | 0.5851 |
| 3.7037 | 100 | - | 0.6000 |
| 4.0 | 108 | - | 0.6019 |
| 5.0 | 135 | - | 0.6160 |
| 5.5556 | 150 | - | 0.6255 |
| 6.0 | 162 | - | 0.6513 |
| 7.0 | 189 | - | 0.6403 |
| 7.4074 | 200 | - | 0.6306 |
| 8.0 | 216 | - | 0.6450 |
| 9.0 | 243 | - | 0.6455 |
| 9.2593 | 250 | - | 0.6489 |
| 10.0 | 270 | - | 0.6355 |
| 11.0 | 297 | - | 0.6619 |
| 11.1111 | 300 | - | 0.6650 |
| 12.0 | 324 | - | 0.6636 |
| 12.9630 | 350 | - | 0.6906 |
| 13.0 | 351 | - | 0.6869 |
| 14.0 | 378 | - | 0.6771 |
| 14.8148 | 400 | - | 0.6541 |
| 15.0 | 405 | - | 0.6537 |
| 16.0 | 432 | - | 0.6485 |
| 16.6667 | 450 | - | 0.6619 |
| 17.0 | 459 | - | 0.6334 |
| 18.0 | 486 | - | 0.6698 |
| 18.5185 | 500 | 2.6848 | 0.6645 |
| 19.0 | 513 | - | 0.6580 |
| 20.0 | 540 | - | 0.6888 |
| 20.3704 | 550 | - | 0.6676 |
| 21.0 | 567 | - | 0.6591 |
| 22.0 | 594 | - | 0.6558 |
| 22.2222 | 600 | - | 0.6554 |
| 23.0 | 621 | - | 0.6476 |
| 24.0 | 648 | - | 0.6580 |
| 24.0741 | 650 | - | 0.6560 |
| 25.0 | 675 | - | 0.6488 |
| 25.9259 | 700 | - | 0.6206 |
| 26.0 | 702 | - | 0.6033 |
| 27.0 | 729 | - | 0.6471 |
| 27.7778 | 750 | - | 0.6293 |
| 28.0 | 756 | - | 0.6346 |
| 29.0 | 783 | - | 0.6406 |
| 29.6296 | 800 | - | 0.6424 |
| 30.0 | 810 | - | 0.6234 |
| 31.0 | 837 | - | 0.6765 |
| 31.4815 | 850 | - | 0.6561 |
| 32.0 | 864 | - | 0.6562 |
| 33.0 | 891 | - | 0.6539 |
| 33.3333 | 900 | - | 0.6569 |
| 34.0 | 918 | - | 0.6462 |
| 35.0 | 945 | - | 0.6724 |
| 35.1852 | 950 | - | 0.6626 |
| 36.0 | 972 | - | 0.6280 |
| 37.0 | 999 | - | 0.6561 |
| 37.0370 | 1000 | 1.0045 | 0.6534 |
| 38.0 | 1026 | - | 0.6570 |
| 38.8889 | 1050 | - | 0.6650 |
| 39.0 | 1053 | - | 0.6516 |
| 40.0 | 1080 | - | 0.6562 |
| 40.7407 | 1100 | - | 0.6778 |
| 41.0 | 1107 | - | 0.6798 |
| 42.0 | 1134 | - | 0.6922 |
| 42.5926 | 1150 | - | 0.6902 |
| 43.0 | 1161 | - | 0.6775 |
| 44.0 | 1188 | - | 0.6663 |
| 44.4444 | 1200 | - | 0.6730 |
| 45.0 | 1215 | - | 0.6807 |
| 46.0 | 1242 | - | 0.6674 |
| 46.2963 | 1250 | - | 0.6657 |
| 47.0 | 1269 | - | 0.6648 |
| 48.0 | 1296 | - | 0.6716 |
| 48.1481 | 1300 | - | 0.6817 |
| 49.0 | 1323 | - | 0.6594 |
| 50.0 | 1350 | - | 0.6611 |
| 51.0 | 1377 | - | 0.6797 |
| 51.8519 | 1400 | - | 0.6858 |
| 52.0 | 1404 | - | 0.6828 |
| 53.0 | 1431 | - | 0.6836 |
| 53.7037 | 1450 | - | 0.6710 |
| 54.0 | 1458 | - | 0.6674 |
| 55.0 | 1485 | - | 0.6598 |
| 55.5556 | 1500 | 0.8341 | 0.6619 |
| 56.0 | 1512 | - | 0.6625 |
| 57.0 | 1539 | - | 0.6686 |
| 57.4074 | 1550 | - | 0.6650 |
| 58.0 | 1566 | - | 0.6214 |
| 59.0 | 1593 | - | 0.6366 |
| 59.2593 | 1600 | - | 0.6399 |
| 60.0 | 1620 | - | 0.6493 |
| 61.0 | 1647 | - | 0.6358 |
| 61.1111 | 1650 | - | 0.6326 |
| 62.0 | 1674 | - | 0.6171 |
| 62.9630 | 1700 | - | 0.6229 |
| 63.0 | 1701 | - | 0.6242 |
| 64.0 | 1728 | - | 0.6658 |
| 64.8148 | 1750 | - | 0.6622 |
| 65.0 | 1755 | - | 0.6555 |
| 66.0 | 1782 | - | 0.6286 |
| 66.6667 | 1800 | - | 0.6524 |
| 67.0 | 1809 | - | 0.6421 |
| 68.0 | 1836 | - | 0.6324 |
| 68.5185 | 1850 | - | 0.6479 |
| 69.0 | 1863 | - | 0.6443 |
| 70.0 | 1890 | - | 0.6260 |
| 70.3704 | 1900 | - | 0.6440 |
| 71.0 | 1917 | - | 0.6390 |
| 72.0 | 1944 | - | 0.6558 |
| 72.2222 | 1950 | - | 0.6563 |
| 73.0 | 1971 | - | 0.6455 |
| 74.0 | 1998 | - | 0.6422 |
| 74.0741 | 2000 | 0.6258 | 0.6507 |
| 75.0 | 2025 | - | 0.6504 |
| 75.9259 | 2050 | - | 0.6493 |
| 76.0 | 2052 | - | 0.6493 |
| 77.0 | 2079 | - | 0.6546 |
| 77.7778 | 2100 | - | 0.6430 |
| 78.0 | 2106 | - | 0.6443 |
| 79.0 | 2133 | - | 0.6432 |
| 79.6296 | 2150 | - | 0.6427 |
| 80.0 | 2160 | - | 0.6467 |
| 81.0 | 2187 | - | 0.6567 |
| 81.4815 | 2200 | - | 0.6529 |
| 82.0 | 2214 | - | 0.6522 |
| 83.0 | 2241 | - | 0.6487 |
| 83.3333 | 2250 | - | 0.6444 |
| 84.0 | 2268 | - | 0.6374 |
| 85.0 | 2295 | - | 0.6441 |
| 85.1852 | 2300 | - | 0.6439 |
| 86.0 | 2322 | - | 0.6378 |
| 87.0 | 2349 | - | 0.6441 |
| 87.0370 | 2350 | - | 0.6439 |
| 88.0 | 2376 | - | 0.6470 |
| 88.8889 | 2400 | - | 0.6519 |
| 89.0 | 2403 | - | 0.6451 |
| 90.0 | 2430 | - | 0.6461 |
| 90.7407 | 2450 | - | 0.6464 |
| 91.0 | 2457 | - | 0.6451 |
| 92.0 | 2484 | - | 0.6396 |
| 92.5926 | 2500 | 0.5699 | 0.6425 |
| 93.0 | 2511 | - | 0.6481 |
| 94.0 | 2538 | - | 0.6449 |
| 94.4444 | 2550 | - | 0.6450 |
| 95.0 | 2565 | - | 0.6452 |
| 96.0 | 2592 | - | 0.6457 |
| 96.2963 | 2600 | - | 0.6457 |
| 97.0 | 2619 | - | 0.6457 |
| 98.0 | 2646 | - | 0.6457 |
| 98.1481 | 2650 | - | 0.6457 |
| 99.0 | 2673 | - | 0.6457 |
| 100.0 | 2700 | - | 0.6457 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
Snowflake/snowflake-arctic-embed-l