Add new SentenceTransformer model

Browse files

Files changed (11) hide show

1_Pooling/config.json +10 -0
README.md +727 -0
config.json +25 -0
config_sentence_transformers.json +12 -0
model.safetensors +3 -0
modules.json +20 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +63 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 1024,
+  "pooling_mode_cls_token": true,
+  "pooling_mode_mean_tokens": false,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,727 @@

+---
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- generated_from_trainer
+- dataset_size:156
+- loss:MatryoshkaLoss
+- loss:MultipleNegativesRankingLoss
+base_model: Snowflake/snowflake-arctic-embed-l
+widget:
+- source_sentence: How is the author planning to utilize prompts in their Datasette
+    project?
+  sentences:
+  - 'January
+    7th: It’s OK to call it Artificial Intelligence
+    9th: What I should have said about the term Artificial Intelligence
+    17th: Talking about Open Source LLMs on Oxide and Friends
+    26th: LLM 0.13: The annotated release notes
+    February
+    21st: The killer app of Gemini Pro 1.5 is video
+    March
+    5th: Prompt injection and jailbreaking are not the same thing
+    8th: The GPT-4 barrier has finally been broken
+    22nd: Claude and ChatGPT for ad-hoc sidequests
+    23rd: Building and testing C extensions for SQLite with ChatGPT Code Interpreter
+    26th: llm cmd undo last git commit—a new plugin for LLM
+    April
+    8th: Building files-to-prompt entirely using Claude 3 Opus
+    10th: Three major LLM releases in 24 hours (plus weeknotes)'
+  - 'Then in December, the Chatbot Arena team introduced a whole new leaderboard for
+    this feature, driven by users building the same interactive app twice with two
+    different models and voting on the answer. Hard to come up with a more convincing
+    argument that this feature is now a commodity that can be effectively implemented
+    against all of the leading models.
+    I’ve been tinkering with a version of this myself for my Datasette project, with
+    the goal of letting users use prompts to build and iterate on custom widgets and
+    data visualizations against their own data. I also figured out a similar pattern
+    for writing one-shot Python programs, enabled by uv.'
+  - 'Another common technique is to use larger models to help create training data
+    for their smaller, cheaper alternatives—a trick used by an increasing number of
+    labs. DeepSeek v3 used “reasoning” data created by DeepSeek-R1. Meta’s Llama 3.3
+    70B fine-tuning used over 25M synthetically generated examples.
+    Careful design of the training data that goes into an LLM appears to be the entire
+    game for creating these models. The days of just grabbing a full scrape of the
+    web and indiscriminately dumping it into a training run are long gone.
+    LLMs somehow got even harder to use'
+- source_sentence: What are the potential pitfalls of using LLMs as power-user tools?
+  sentences:
+  - 'Another common technique is to use larger models to help create training data
+    for their smaller, cheaper alternatives—a trick used by an increasing number of
+    labs. DeepSeek v3 used “reasoning” data created by DeepSeek-R1. Meta’s Llama 3.3
+    70B fine-tuning used over 25M synthetically generated examples.
+    Careful design of the training data that goes into an LLM appears to be the entire
+    game for creating these models. The days of just grabbing a full scrape of the
+    web and indiscriminately dumping it into a training run are long gone.
+    LLMs somehow got even harder to use'
+  - 'A drum I’ve been banging for a while is that LLMs are power-user tools—they’re
+    chainsaws disguised as kitchen knives. They look deceptively simple to use—how
+    hard can it be to type messages to a chatbot?—but in reality you need a huge depth
+    of both understanding and experience to make the most of them and avoid their
+    many pitfalls.
+    If anything, this problem got worse in 2024.
+    We’ve built computer systems you can talk to in human language, that will answer
+    your questions and usually get them right! ... depending on the question, and
+    how you ask it, and whether it’s accurately reflected in the undocumented and
+    secret training set.'
+  - 'These abilities are just a few weeks old at this point, and I don’t think their
+    impact has been fully felt yet. If you haven’t tried them out yet you really should.
+    Both Gemini and OpenAI offer API access to these features as well. OpenAI started
+    with a WebSocket API that was quite challenging to use, but in December they announced
+    a new WebRTC API which is much easier to get started with. Building a web app
+    that a user can talk to via voice is easy now!
+    Prompt driven app generation is a commodity already
+    This was possible with GPT-4 in 2023, but the value it provides became evident
+    in 2024.'
+- source_sentence: What challenges are associated with using LLMs in the year of slop?
+  sentences:
+  - 'So far, I think they’re a net positive. I’ve used them on a personal level to
+    improve my productivity (and entertain myself) in all sorts of different ways.
+    I think people who learn how to use them effectively can gain a significant boost
+    to their quality of life.
+    A lot of people are yet to be sold on their value! Some think their negatives
+    outweigh their positives, some think they are all hot air, and some even think
+    they represent an existential threat to humanity.
+    They’re actually quite easy to build
+    The most surprising thing we’ve learned about LLMs this year is that they’re actually
+    quite easy to build.'
+  - 'The year of slop
+    Synthetic training data works great
+    LLMs somehow got even harder to use
+    Knowledge is incredibly unevenly distributed
+    LLMs need better criticism
+    Everything tagged “llms” on my blog in 2024'
+  - 'Meta’s Llama 3.2 models deserve a special mention. They may not be GPT-4 class,
+    but at 1B and 3B sizes they punch massively above their weight. I run Llama 3.2
+    3B on my iPhone using the free MLC Chat iOS app and it’s a shockingly capable
+    model for its tiny (<2GB) size. Try firing it up and asking it for “a plot outline
+    of a Netflix Christmas movie where a data journalist falls in love with a local
+    ceramacist”. Here’s what I got, at a respectable 20 tokens per second:'
+- source_sentence: What capabilities does Google’s Gemini have regarding audio input
+    and output?
+  sentences:
+  - 'There’s a flipside to this too: a lot of better informed people have sworn off
+    LLMs entirely because they can’t see how anyone could benefit from a tool with
+    so many flaws. The key skill in getting the most out of LLMs is learning to work
+    with tech that is both inherently unreliable and incredibly powerful at the same
+    time. This is a decidedly non-obvious skill to acquire!
+    There is so much space for helpful education content here, but we need to do do
+    a lot better than outsourcing it all to AI grifters with bombastic Twitter threads.
+    Knowledge is incredibly unevenly distributed
+    Most people have heard of ChatGPT by now. How many have heard of Claude?'
+  - 'There’s still plenty to worry about with respect to the environmental impact
+    of the great AI datacenter buildout, but a lot of the concerns over the energy
+    cost of individual prompts are no longer credible.
+    Here’s a fun napkin calculation: how much would it cost to generate short descriptions
+    of every one of the 68,000 photos in my personal photo library using Google’s
+    Gemini 1.5 Flash 8B (released in October), their cheapest model?
+    Each photo would need 260 input tokens and around 100 output tokens.
+    260 * 68,000 = 17,680,000 input tokens
+    17,680,000 * $0.0375/million = $0.66
+    100 * 68,000 = 6,800,000 output tokens
+    6,800,000 * $0.15/million = $1.02'
+  - 'Your browser does not support the audio element.
+    OpenAI aren’t the only group with a multi-modal audio model. Google’s Gemini also
+    accepts audio input, and the Google Gemini apps can speak in a similar way to
+    ChatGPT now. Amazon also pre-announced voice mode for Amazon Nova, but that’s
+    meant to roll out in Q1 of 2025.
+    Google’s NotebookLM, released in September, took audio output to a new level by
+    producing spookily realistic conversations between two “podcast hosts” about anything
+    you fed into their tool. They later added custom instructions, so naturally I
+    turned them into pelicans:
+    Your browser does not support the audio element.'
+- source_sentence: What improvements were noted in the intonation of ChatGPT Advanced
+    Voice mode during its rollout?
+  sentences:
+  - 'When ChatGPT Advanced Voice mode finally did roll out (a slow roll from August
+    through September) it was spectacular. I’ve been using it extensively on walks
+    with my dog and it’s amazing how much the improvement in intonation elevates the
+    material. I’ve also had a lot of fun experimenting with the OpenAI audio APIs.
+    Even more fun: Advanced Voice mode can do accents! Here’s what happened when I
+    told it I need you to pretend to be a California brown pelican with a very thick
+    Russian accent, but you talk to me exclusively in Spanish.'
+  - 'When @v0 first came out we were paranoid about protecting the prompt with all
+    kinds of pre and post processing complexity.
+    We completely pivoted to let it rip. A prompt without the evals, models, and especially
+    UX is like getting a broken ASML machine without a manual'
+  - 'January
+    7th: It’s OK to call it Artificial Intelligence
+    9th: What I should have said about the term Artificial Intelligence
+    17th: Talking about Open Source LLMs on Oxide and Friends
+    26th: LLM 0.13: The annotated release notes
+    February
+    21st: The killer app of Gemini Pro 1.5 is video
+    March
+    5th: Prompt injection and jailbreaking are not the same thing
+    8th: The GPT-4 barrier has finally been broken
+    22nd: Claude and ChatGPT for ad-hoc sidequests
+    23rd: Building and testing C extensions for SQLite with ChatGPT Code Interpreter
+    26th: llm cmd undo last git commit—a new plugin for LLM
+    April
+    8th: Building files-to-prompt entirely using Claude 3 Opus
+    10th: Three major LLM releases in 24 hours (plus weeknotes)'
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+metrics:
+- cosine_accuracy@1
+- cosine_accuracy@3
+- cosine_accuracy@5
+- cosine_accuracy@10
+- cosine_precision@1
+- cosine_precision@3
+- cosine_precision@5
+- cosine_precision@10
+- cosine_recall@1
+- cosine_recall@3
+- cosine_recall@5
+- cosine_recall@10
+- cosine_ndcg@10
+- cosine_mrr@10
+- cosine_map@100
+model-index:
+- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
+  results:
+  - task:
+      type: information-retrieval
+      name: Information Retrieval
+    dataset:
+      name: Unknown
+      type: unknown
+    metrics:
+    - type: cosine_accuracy@1
+      value: 0.75
+      name: Cosine Accuracy@1
+    - type: cosine_accuracy@3
+      value: 1.0
+      name: Cosine Accuracy@3
+    - type: cosine_accuracy@5
+      value: 1.0
+      name: Cosine Accuracy@5
+    - type: cosine_accuracy@10
+      value: 1.0
+      name: Cosine Accuracy@10
+    - type: cosine_precision@1
+      value: 0.75
+      name: Cosine Precision@1
+    - type: cosine_precision@3
+      value: 0.3333333333333333
+      name: Cosine Precision@3
+    - type: cosine_precision@5
+      value: 0.20000000000000004
+      name: Cosine Precision@5
+    - type: cosine_precision@10
+      value: 0.10000000000000002
+      name: Cosine Precision@10
+    - type: cosine_recall@1
+      value: 0.75
+      name: Cosine Recall@1
+    - type: cosine_recall@3
+      value: 1.0
+      name: Cosine Recall@3
+    - type: cosine_recall@5
+      value: 1.0
+      name: Cosine Recall@5
+    - type: cosine_recall@10
+      value: 1.0
+      name: Cosine Recall@10
+    - type: cosine_ndcg@10
+      value: 0.8968216255952429
+      name: Cosine Ndcg@10
+    - type: cosine_mrr@10
+      value: 0.861111111111111
+      name: Cosine Mrr@10
+    - type: cosine_map@100
+      value: 0.8611111111111112
+      name: Cosine Map@100
+---
+# SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b -->
+- **Maximum Sequence Length:** 512 tokens
+- **Output Dimensionality:** 1024 dimensions
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
+  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("ngiometti/legal-ft-2")
+# Run inference
+sentences = [
+    'What improvements were noted in the intonation of ChatGPT Advanced Voice mode during its rollout?',
+    'When ChatGPT Advanced Voice mode finally did roll out (a slow roll from August through September) it was spectacular. I’ve been using it extensively on walks with my dog and it’s amazing how much the improvement in intonation elevates the material. I’ve also had a lot of fun experimenting with the OpenAI audio APIs.\nEven more fun: Advanced Voice mode can do accents! Here’s what happened when I told it I need you to pretend to be a California brown pelican with a very thick Russian accent, but you talk to me exclusively in Spanish.',
+    'January\n\n7th: It’s OK to call it Artificial Intelligence\n\n9th: What I should have said about the term Artificial Intelligence\n\n17th: Talking about Open Source LLMs on Oxide and Friends\n\n26th: LLM 0.13: The annotated release notes\n\n\n\nFebruary\n\n21st: The killer app of Gemini Pro 1.5 is video\n\n\n\nMarch\n\n5th: Prompt injection and jailbreaking are not the same thing\n\n8th: The GPT-4 barrier has finally been broken\n\n22nd: Claude and ChatGPT for ad-hoc sidequests\n\n23rd: Building and testing C extensions for SQLite with ChatGPT Code Interpreter\n\n26th: llm cmd undo last git commit—a new plugin for LLM\n\n\n\nApril\n\n8th: Building files-to-prompt entirely using Claude 3 Opus\n\n10th: Three major LLM releases in 24 hours (plus weeknotes)',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 1024]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)
+# [3, 3]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+## Evaluation
+### Metrics
+#### Information Retrieval
+* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
+| Metric              | Value      |
+|:--------------------|:-----------|
+| cosine_accuracy@1   | 0.75       |
+| cosine_accuracy@3   | 1.0        |
+| cosine_accuracy@5   | 1.0        |
+| cosine_accuracy@10  | 1.0        |
+| cosine_precision@1  | 0.75       |
+| cosine_precision@3  | 0.3333     |
+| cosine_precision@5  | 0.2        |
+| cosine_precision@10 | 0.1        |
+| cosine_recall@1     | 0.75       |
+| cosine_recall@3     | 1.0        |
+| cosine_recall@5     | 1.0        |
+| cosine_recall@10    | 1.0        |
+| **cosine_ndcg@10**  | **0.8968** |
+| cosine_mrr@10       | 0.8611     |
+| cosine_map@100      | 0.8611     |
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### Unnamed Dataset
+* Size: 156 training samples
+* Columns: <code>sentence_0</code> and <code>sentence_1</code>
+* Approximate statistics based on the first 156 samples:
+  |         | sentence_0                                                                         | sentence_1                                                                           |
+  |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
+  | type    | string                                                                             | string                                                                               |
+  | details | <ul><li>min: 14 tokens</li><li>mean: 20.31 tokens</li><li>max: 36 tokens</li></ul> | <ul><li>min: 43 tokens</li><li>mean: 130.44 tokens</li><li>max: 204 tokens</li></ul> |
+* Samples:
+  | sentence_0                                                                                                     | sentence_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+  |:---------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>What are some potential applications of Large Language Models (LLMs) mentioned in the context?</code>    | <code>Large Language Models<br>They’re actually quite easy to build<br>You can run LLMs on your own devices<br>Hobbyists can build their own fine-tuned models<br>We don’t yet know how to build GPT-4<br>Vibes Based Development<br>LLMs are really smart, and also really, really dumb<br>Gullibility is the biggest unsolved problem<br>Code may be the best application<br>The ethics of this space remain diabolically complex<br>My blog in 2023</code>                                                                                                            |
+  | <code>What is identified as the biggest unsolved problem related to LLMs?</code>                               | <code>Large Language Models<br>They’re actually quite easy to build<br>You can run LLMs on your own devices<br>Hobbyists can build their own fine-tuned models<br>We don’t yet know how to build GPT-4<br>Vibes Based Development<br>LLMs are really smart, and also really, really dumb<br>Gullibility is the biggest unsolved problem<br>Code may be the best application<br>The ethics of this space remain diabolically complex<br>My blog in 2023</code>                                                                                                            |
+  | <code>What improvements were noted in the intonation of ChatGPT Advanced Voice mode during its rollout?</code> | <code>When ChatGPT Advanced Voice mode finally did roll out (a slow roll from August through September) it was spectacular. I’ve been using it extensively on walks with my dog and it’s amazing how much the improvement in intonation elevates the material. I’ve also had a lot of fun experimenting with the OpenAI audio APIs.<br>Even more fun: Advanced Voice mode can do accents! Here’s what happened when I told it I need you to pretend to be a California brown pelican with a very thick Russian accent, but you talk to me exclusively in Spanish.</code> |
+* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
+  ```json
+  {
+      "loss": "MultipleNegativesRankingLoss",
+      "matryoshka_dims": [
+          768,
+          512,
+          256,
+          128,
+          64
+      ],
+      "matryoshka_weights": [
+          1,
+          1,
+          1,
+          1,
+          1
+      ],
+      "n_dims_per_step": -1
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 10
+- `per_device_eval_batch_size`: 10
+- `num_train_epochs`: 10
+- `multi_dataset_batch_sampler`: round_robin
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: steps
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 10
+- `per_device_eval_batch_size`: 10
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 5e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1
+- `num_train_epochs`: 10
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.0
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: False
+- `fp16`: False
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `dispatch_batches`: None
+- `split_batches`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: False
+- `prompts`: None
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: round_robin
+</details>
+### Training Logs
+| Epoch | Step | cosine_ndcg@10 |
+|:-----:|:----:|:--------------:|
+| 1.0   | 16   | 0.9122         |
+| 2.0   | 32   | 0.9093         |
+| 3.0   | 48   | 0.8968         |
+| 3.125 | 50   | 0.8968         |
+| 4.0   | 64   | 0.8939         |
+| 5.0   | 80   | 0.8908         |
+| 6.0   | 96   | 0.8908         |
+| 6.25  | 100  | 0.8908         |
+| 7.0   | 112  | 0.8939         |
+| 8.0   | 128  | 0.8968         |
+| 9.0   | 144  | 0.8968         |
+| 9.375 | 150  | 0.8968         |
+| 10.0  | 160  | 0.8968         |
+### Framework Versions
+- Python: 3.13.1
+- Sentence Transformers: 3.4.1
+- Transformers: 4.48.3
+- PyTorch: 2.6.0+cu124
+- Accelerate: 1.3.0
+- Datasets: 3.2.0
+- Tokenizers: 0.21.0
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### MatryoshkaLoss
+```bibtex
+@misc{kusupati2024matryoshka,
+    title={Matryoshka Representation Learning},
+    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
+    year={2024},
+    eprint={2205.13147},
+    archivePrefix={arXiv},
+    primaryClass={cs.LG}
+}
+```
+#### MultipleNegativesRankingLoss
+```bibtex
+@misc{henderson2017efficient,
+    title={Efficient Natural Language Response Suggestion for Smart Reply},
+    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
+    year={2017},
+    eprint={1705.00652},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "_name_or_path": "Snowflake/snowflake-arctic-embed-l",
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 1024,
+  "initializer_range": 0.02,
+  "intermediate_size": 4096,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 24,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.48.3",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "__version__": {
+    "sentence_transformers": "3.4.1",
+    "transformers": "4.48.3",
+    "pytorch": "2.6.0+cu124"
+  },
+  "prompts": {
+    "query": "Represent this sentence for searching relevant passages: "
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:757b432add6dde99d0f35e29a427f9f084f1b4a3fe85c146341ea9edd6f1d6a5
+size 1336413848

modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 512,
+  "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "max_length": 512,
+  "model_max_length": 512,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff