bobox's picture
Training in progress, step 305, checkpoint
d0b63d2 verified
|
raw
history blame
77.7 kB
metadata
base_model: microsoft/deberta-v3-small
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - dot_accuracy
  - dot_accuracy_threshold
  - dot_f1
  - dot_f1_threshold
  - dot_precision
  - dot_recall
  - dot_ap
  - manhattan_accuracy
  - manhattan_accuracy_threshold
  - manhattan_f1
  - manhattan_f1_threshold
  - manhattan_precision
  - manhattan_recall
  - manhattan_ap
  - euclidean_accuracy
  - euclidean_accuracy_threshold
  - euclidean_f1
  - euclidean_f1_threshold
  - euclidean_precision
  - euclidean_recall
  - euclidean_ap
  - max_accuracy
  - max_accuracy_threshold
  - max_f1
  - max_f1_threshold
  - max_precision
  - max_recall
  - max_ap
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:32500
  - loss:GISTEmbedLoss
widget:
  - source_sentence: What was the name of Jed's nephew in The Beverly Hillbillies?
    sentences:
      - >-
        Jed Clampett - The Beverly Hillbillies Characters - ShareTV Buddy Ebsen
        began his career as a dancer in the late 1920s in a Broadway chorus. He
        later formed a vaudeville ... Character Bio Although he had received
        little formal education, Jed Clampett had a good deal of common sense. A
        good-natured man, he is the apparent head of the family. Jed's wife
        (Elly May's mother) died, but is referred to in the episode "Duke Steals
        A Wife" as Rose Ellen. Jed was shown to be an expert marksman and was
        extremely loyal to his family and kinfolk. The huge oil pool in the
        swamp he owned was the beginning of his rags-to-riches journey to
        Beverly Hills. Although he longed for the old ways back in the hills, he
        made the best of being in Beverly Hills. Whenever he had anything on his
        mind, he would sit on the curbstone of his mansion and whittle until he
        came up with the answer. Jedediah, the version of Jed's name used in the
        1993 Beverly Hillbillies theatrical movie, was never mentioned in the
        original television series (though coincidentally, on Ebsen's subsequent
        series, Barnaby Jones, Barnaby's nephew J.R. was also named Jedediah).
        In one episode Jed and Granny reminisce about seeing Buddy Ebsen and
        Vilma Ebsen—a joking reference to the Ebsens' song and dance act. Jed
        appears in all 274 episodes. Episode Screenshots
      - a stove generates heat for cooking usually
      - >-
        Miss Marple series by Agatha Christie Miss Marple series 43 works, 13
        primary works Mystery series in order of publication. Miss Marple is
        introduced in The Murder at the Vicarage but the books can be read in
        any order. Mixed short story collections are included if some are
        Marple, often have horror, supernatural, maybe detective Poirot, Pyne,
        or Quin. Note that "Nemesis" should be read AFTER "A Caribbean Holiday"
  - source_sentence: >-
      A recording of folk songs done for the Columbia society in 1942 was
      largely arranged by Pjetër Dungu .
    sentences:
      - Someone cooking drugs in a spoon over a candle
      - >-
        A recording of folk songs made for the Columbia society in 1942 was
        largely arranged by Pjetër Dungu .
      - >-
        A Murder of Crows, A Parliament of Owls What do You Call a Group of
        Birds? Do you know what a group of Ravens is called? What about a group
        of peacocks, snipe or hummingbirds? Here is a list of Bird Collectives,
        terms that you can use to describe a    group of birds. Birds in general
  - source_sentence: A person in a kitchen looking at the oven.
    sentences:
      - >-
        staying warm has a positive impact on an animal 's survival. Furry
        animals grow thicker coats to keep warm in the winter. 
         Furry animals grow thicker coats which has a positive impact on their survival. 
      - A woman In the kitchen opening her oven.
      - >-
        EE has apologised after a fault left some of its customers unable to use
        the internet on their mobile devices.
  - source_sentence: Air can be separated into several elements.
    sentences:
      - >-
        Which of the following substances can be separated into several
        elements?
      - >-
        Funny Interesting Facts Humor Strange: Carl and the Passions changed
        band name to what Carl and the Passions changed band name to what Beach
        Boys Carl and the Passions - "So Tough" is the fifteenth studio album
        released by The Beach Boys in 1972. In its initial release, it was the
        second disc of a two-album set with Pet Sounds (which The Beach Boys
        were able to license from Capitol Records). Unfortunately, due to the
        fact that Carl and the Passions - "So Tough" was a transitional album
        that saw the departure of one member and the introduction of two new
        ones, making it wildly inconsistent in terms of type of material
        present, it paled next to their 1966 classic and was seen as something
        of a disappointment in its time of release. The title of the album
        itself was a reference to an early band Carl Wilson had been in as a
        teenager (some say a possible early name for the Beach Boys). It was
        also the first album released under a new deal with Warner Bros. that
        allowed the company to distribute all future Beach Boys product in
        foreign as well as domestic markets.
      - >-
        Which statement correctly describes a relationship between two human
        body systems?
  - source_sentence: What do outdoor plants require to survive?
    sentences:
      - >-
        a plants require water for survival. If no rain or watering, the plant
        dies. 
         Outdoor plants require rain to survive.
      - >-
        (Vegan) soups are nutritious. In addition to them being easy to digest,
        most the time, soups are made from nutrient-dense ingredients like
        herbs, spices, vegetables, and beans. Because the soup is full of those
        nutrients AND that it's easy to digest, your body is able to absorb more
        of those nutrients into your system.
      - >-
        If you do the math, there are 11,238,513 possible combinations of five
        white balls (without order mattering). Multiply that by the 26 possible
        red balls, and you get 292,201,338 possible Powerball number
        combinations. At $2 per ticket, you'd need $584,402,676 to buy every
        single combination and guarantee a win.
model-index:
  - name: SentenceTransformer based on microsoft/deberta-v3-small
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.12009124140478655
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.180573622028628
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.18492770691981375
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.21139381574888486
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.15529980522625675
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.18058248277838349
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.11997652374043644
            name: Pearson Dot
          - type: spearman_dot
            value: 0.18041242798509616
            name: Spearman Dot
          - type: pearson_max
            value: 0.18492770691981375
            name: Pearson Max
          - type: spearman_max
            value: 0.21139381574888486
            name: Spearman Max
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: allNLI dev
          type: allNLI-dev
        metrics:
          - type: cosine_accuracy
            value: 0.66796875
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.9721524119377136
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.5029239766081871
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.821484386920929
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.33659491193737767
            name: Cosine Precision
          - type: cosine_recall
            value: 0.9942196531791907
            name: Cosine Recall
          - type: cosine_ap
            value: 0.3857994503224615
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.66796875
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 746.914794921875
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.5029239766081871
            name: Dot F1
          - type: dot_f1_threshold
            value: 631.138916015625
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.33659491193737767
            name: Dot Precision
          - type: dot_recall
            value: 0.9942196531791907
            name: Dot Recall
          - type: dot_ap
            value: 0.38572844452312516
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.666015625
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 95.24527740478516
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.5045317220543807
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 254.973388671875
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.34151329243353784
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.9653179190751445
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.39193409293721965
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.66796875
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 6.541449546813965
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.5029239766081871
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 16.558998107910156
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.33659491193737767
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.9942196531791907
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.3858031188548441
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.66796875
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 746.914794921875
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.5045317220543807
            name: Max F1
          - type: max_f1_threshold
            value: 631.138916015625
            name: Max F1 Threshold
          - type: max_precision
            value: 0.34151329243353784
            name: Max Precision
          - type: max_recall
            value: 0.9942196531791907
            name: Max Recall
          - type: max_ap
            value: 0.39193409293721965
            name: Max Ap
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: Qnli dev
          type: Qnli-dev
        metrics:
          - type: cosine_accuracy
            value: 0.58203125
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.9368094801902771
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.6300268096514745
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.802739143371582
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.46078431372549017
            name: Cosine Precision
          - type: cosine_recall
            value: 0.9957627118644068
            name: Cosine Recall
          - type: cosine_ap
            value: 0.5484497034083067
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.58203125
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 719.7518310546875
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.6300268096514745
            name: Dot F1
          - type: dot_f1_threshold
            value: 616.7227783203125
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.46078431372549017
            name: Dot Precision
          - type: dot_recall
            value: 0.9957627118644068
            name: Dot Recall
          - type: dot_ap
            value: 0.548461685358088
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.607421875
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 182.1275177001953
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.6303724928366763
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 230.0565185546875
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.47619047619047616
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.9322033898305084
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.5750034744442096
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.58203125
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 9.853867530822754
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.6300268096514745
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 17.40953254699707
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.46078431372549017
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.9957627118644068
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.5484497034083067
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.607421875
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 719.7518310546875
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.6303724928366763
            name: Max F1
          - type: max_f1_threshold
            value: 616.7227783203125
            name: Max F1 Threshold
          - type: max_precision
            value: 0.47619047619047616
            name: Max Precision
          - type: max_recall
            value: 0.9957627118644068
            name: Max Recall
          - type: max_ap
            value: 0.5750034744442096
            name: Max Ap

SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/deberta-v3-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): AdvancedWeightedPooling(
    (alpha_dropout_layer): Dropout(p=0.01, inplace=False)
    (gate_dropout_layer): Dropout(p=0.05, inplace=False)
    (linear_cls_pj): Linear(in_features=768, out_features=768, bias=True)
    (linear_cls_Qpj): Linear(in_features=768, out_features=768, bias=True)
    (linear_mean_pj): Linear(in_features=768, out_features=768, bias=True)
    (linear_attnOut): Linear(in_features=768, out_features=768, bias=True)
    (mha): MultiheadAttention(
      (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
    )
    (layernorm_output): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_weightedPooing): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_pjCls): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_pjMean): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_attnOut): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTa3-s-CustomPoolin-toytest2-step1-checkpoints-tmp")
# Run inference
sentences = [
    'What do outdoor plants require to survive?',
    'a plants require water for survival. If no rain or watering, the plant dies. \n Outdoor plants require rain to survive.',
    "(Vegan) soups are nutritious. In addition to them being easy to digest, most the time, soups are made from nutrient-dense ingredients like herbs, spices, vegetables, and beans. Because the soup is full of those nutrients AND that it's easy to digest, your body is able to absorb more of those nutrients into your system.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.1201
spearman_cosine 0.1806
pearson_manhattan 0.1849
spearman_manhattan 0.2114
pearson_euclidean 0.1553
spearman_euclidean 0.1806
pearson_dot 0.12
spearman_dot 0.1804
pearson_max 0.1849
spearman_max 0.2114

Binary Classification

Metric Value
cosine_accuracy 0.668
cosine_accuracy_threshold 0.9722
cosine_f1 0.5029
cosine_f1_threshold 0.8215
cosine_precision 0.3366
cosine_recall 0.9942
cosine_ap 0.3858
dot_accuracy 0.668
dot_accuracy_threshold 746.9148
dot_f1 0.5029
dot_f1_threshold 631.1389
dot_precision 0.3366
dot_recall 0.9942
dot_ap 0.3857
manhattan_accuracy 0.666
manhattan_accuracy_threshold 95.2453
manhattan_f1 0.5045
manhattan_f1_threshold 254.9734
manhattan_precision 0.3415
manhattan_recall 0.9653
manhattan_ap 0.3919
euclidean_accuracy 0.668
euclidean_accuracy_threshold 6.5414
euclidean_f1 0.5029
euclidean_f1_threshold 16.559
euclidean_precision 0.3366
euclidean_recall 0.9942
euclidean_ap 0.3858
max_accuracy 0.668
max_accuracy_threshold 746.9148
max_f1 0.5045
max_f1_threshold 631.1389
max_precision 0.3415
max_recall 0.9942
max_ap 0.3919

Binary Classification

Metric Value
cosine_accuracy 0.582
cosine_accuracy_threshold 0.9368
cosine_f1 0.63
cosine_f1_threshold 0.8027
cosine_precision 0.4608
cosine_recall 0.9958
cosine_ap 0.5484
dot_accuracy 0.582
dot_accuracy_threshold 719.7518
dot_f1 0.63
dot_f1_threshold 616.7228
dot_precision 0.4608
dot_recall 0.9958
dot_ap 0.5485
manhattan_accuracy 0.6074
manhattan_accuracy_threshold 182.1275
manhattan_f1 0.6304
manhattan_f1_threshold 230.0565
manhattan_precision 0.4762
manhattan_recall 0.9322
manhattan_ap 0.575
euclidean_accuracy 0.582
euclidean_accuracy_threshold 9.8539
euclidean_f1 0.63
euclidean_f1_threshold 17.4095
euclidean_precision 0.4608
euclidean_recall 0.9958
euclidean_ap 0.5484
max_accuracy 0.6074
max_accuracy_threshold 719.7518
max_f1 0.6304
max_f1_threshold 616.7228
max_precision 0.4762
max_recall 0.9958
max_ap 0.575

Training Details

Training Dataset

Unnamed Dataset

  • Size: 32,500 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 4 tokens
    • mean: 29.43 tokens
    • max: 400 tokens
    • min: 2 tokens
    • mean: 57.02 tokens
    • max: 389 tokens
  • Samples:
    sentence1 sentence2
    What is the chemical symbol for Silver? Chemical Elements.com - Silver (Ag) Bentor, Yinon. Chemical Element.com - Silver. http://www.chemicalelements.com/elements/ag.html. For more information about citing online sources, please visit the MLA's Website . This page was created by Yinon Bentor. Use of this web site is restricted by this site's license agreement . Copyright © 1996-2012 Yinon Bentor. All Rights Reserved.
    e. in solids the atoms are closely locked in position and can only vibrate, in liquids the atoms and molecules are more loosely connected and can collide with and move past one another, while in gases the atoms or molecules are free to move independently, colliding frequently. Within a substance, atoms that collide frequently and move independently of one another are most likely in a gas
    Keanu Neal was born in 1995 . Keanu Neal ( born July 26 , 1995 ) is an American football safety for the Atlanta Falcons of the National Football League ( NFL ) .
  • Loss: GISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.025}
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,664 evaluation samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 4 tokens
    • mean: 28.9 tokens
    • max: 348 tokens
    • min: 2 tokens
    • mean: 57.31 tokens
    • max: 450 tokens
  • Samples:
    sentence1 sentence2
    Gene expression is regulated primarily at the what level? Gene expression is regulated primarily at the transcriptional level.
    Diffusion Diffusion is a process where atoms or molecules move from areas of high concentration to areas of low concentration. Diffusion is the process in which a substance naturally moves from an area of higher to lower concentration.
    In which James Bond film did Sean Connery wear the Bell Rocket Belt (Jet Pack)? Jet Pack - James Bond Gadgets 125lbs Summary James Bond used the Jetpack in 1965's Thunderball, to escape from gunmen after killing a SPECTRE agent. The Jetpack In the 1965 movie Thunderball, James Bond (Sean Connery) uses Q's Jetpack to escape from two gunmen after killing Jacques Bouvar, SPECTRE Agent No. 6. It was also used in the Thunderball movie posters, being the "Look Up" part of the "Look Up! Look Down! Look Out!" tagline. The Jetpack returned in the 2002 movie Die Another Day, in the Q scene that showcased many other classic gadgets. The Jetpack is a very popular Bond gadget and is a favorite among many fans due to its originality and uniqueness. The Bell Rocket Belt The Jetpack is actually a Bell Rocket Belt, a fully functional rocket pack device. It was designed for use in the army, but was rejected because of its short flying time of 21-22 seconds. Powered by hydrogen peroxide, it could fly about 250m and reach a maximum altitude of 18m, going 55km/h. Despite its impracticality in the real world, the Jetpack made a spectacular debut in Thunderball. Although Sean Connery is seen in the takeoff and landings, the main flight was piloted by Gordon Yeager and Bill Suitor.
  • Loss: GISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.025}
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 256
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
  • warmup_ratio: 0.33
  • save_safetensors: False
  • fp16: True
  • push_to_hub: True
  • hub_model_id: bobox/DeBERTa3-s-CustomPoolin-toytest2-step1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
  • warmup_ratio: 0.33
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: bobox/DeBERTa3-s-CustomPoolin-toytest2-step1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss sts-test_spearman_cosine allNLI-dev_max_ap Qnli-dev_max_ap
0.0010 1 18.7427 - - - -
0.0020 2 11.6434 - - - -
0.0030 3 7.4859 - - - -
0.0039 4 7.3779 - - - -
0.0049 5 17.5878 - - - -
0.0059 6 8.4984 - - - -
0.0069 7 8.375 - - - -
0.0079 8 7.3241 - - - -
0.0089 9 10.3081 - - - -
0.0098 10 8.5363 - - - -
0.0108 11 17.2241 - - - -
0.0118 12 7.575 - - - -
0.0128 13 9.1905 - - - -
0.0138 14 11.7727 - - - -
0.0148 15 9.5827 - - - -
0.0157 16 7.4432 - - - -
0.0167 17 7.1573 - - - -
0.0177 18 19.8016 - - - -
0.0187 19 19.5118 - - - -
0.0197 20 7.9062 - - - -
0.0207 21 8.6791 - - - -
0.0217 22 7.7318 - - - -
0.0226 23 7.9319 - - - -
0.0236 24 7.192 - - - -
0.0246 25 15.5799 - - - -
0.0256 26 9.7859 - - - -
0.0266 27 9.9259 - - - -
0.0276 28 6.3076 - - - -
0.0285 29 7.4471 - - - -
0.0295 30 7.1246 - - - -
0.0305 31 6.5505 - - - -
0.0315 32 18.5194 - - - -
0.0325 33 7.0747 - - - -
0.0335 34 14.9456 - - - -
0.0344 35 6.608 - - - -
0.0354 36 8.4672 - - - -
0.0364 37 6.8853 - - - -
0.0374 38 13.6063 - - - -
0.0384 39 7.2625 - - - -
0.0394 40 6.2234 - - - -
0.0404 41 14.9675 - - - -
0.0413 42 6.6038 - - - -
0.0423 43 13.1173 - - - -
0.0433 44 16.6992 - - - -
0.0443 45 6.4828 - - - -
0.0453 46 5.9815 - - - -
0.0463 47 6.1738 - - - -
0.0472 48 7.134 - - - -
0.0482 49 9.3933 - - - -
0.0492 50 10.8085 - - - -
0.0502 51 11.4172 - - - -
0.0512 52 7.3397 - - - -
0.0522 53 5.8851 - - - -
0.0531 54 6.8105 - - - -
0.0541 55 5.3637 - - - -
0.0551 56 6.2628 - - - -
0.0561 57 6.0039 - - - -
0.0571 58 7.5859 - - - -
0.0581 59 6.0802 - - - -
0.0591 60 5.5822 - - - -
0.0600 61 5.8773 - - - -
0.0610 62 6.0814 - - - -
0.0620 63 5.4483 - - - -
0.0630 64 10.2506 - - - -
0.0640 65 10.5976 - - - -
0.0650 66 6.9942 - - - -
0.0659 67 5.4813 - - - -
0.0669 68 7.045 - - - -
0.0679 69 5.8549 - - - -
0.0689 70 8.8514 - - - -
0.0699 71 5.2557 - - - -
0.0709 72 5.1181 - - - -
0.0719 73 5.5331 - - - -
0.0728 74 5.5944 - - - -
0.0738 75 4.6332 - - - -
0.0748 76 4.9532 - - - -
0.0758 77 5.055 - - - -
0.0768 78 4.5005 - - - -
0.0778 79 5.1997 - - - -
0.0787 80 5.1479 - - - -
0.0797 81 5.1777 - - - -
0.0807 82 5.5565 - - - -
0.0817 83 4.6999 - - - -
0.0827 84 5.0681 - - - -
0.0837 85 5.2208 - - - -
0.0846 86 4.56 - - - -
0.0856 87 4.6793 - - - -
0.0866 88 4.4611 - - - -
0.0876 89 9.623 - - - -
0.0886 90 5.0316 - - - -
0.0896 91 4.1771 - - - -
0.0906 92 4.9652 - - - -
0.0915 93 8.7432 - - - -
0.0925 94 4.6234 - - - -
0.0935 95 4.4016 - - - -
0.0945 96 4.9903 - - - -
0.0955 97 4.5606 - - - -
0.0965 98 4.9534 - - - -
0.0974 99 8.1838 - - - -
0.0984 100 4.9736 - - - -
0.0994 101 4.4733 - - - -
0.1004 102 4.9725 - - - -
0.1014 103 4.5861 - - - -
0.1024 104 7.7634 - - - -
0.1033 105 4.9915 - - - -
0.1043 106 5.1391 - - - -
0.1053 107 5.0157 - - - -
0.1063 108 4.0982 - - - -
0.1073 109 4.2178 - - - -
0.1083 110 4.6193 - - - -
0.1093 111 4.7638 - - - -
0.1102 112 4.1207 - - - -
0.1112 113 5.2034 - - - -
0.1122 114 5.0693 - - - -
0.1132 115 4.7895 - - - -
0.1142 116 4.9486 - - - -
0.1152 117 4.6552 - - - -
0.1161 118 4.4555 - - - -
0.1171 119 4.8977 - - - -
0.1181 120 7.6836 - - - -
0.1191 121 4.8106 - - - -
0.1201 122 4.9958 - - - -
0.1211 123 4.4585 - - - -
0.1220 124 7.5559 - - - -
0.1230 125 4.2636 - - - -
0.1240 126 4.0436 - - - -
0.125 127 4.7416 - - - -
0.1260 128 4.2215 - - - -
0.1270 129 6.3561 - - - -
0.1280 130 6.2299 - - - -
0.1289 131 4.3492 - - - -
0.1299 132 4.0216 - - - -
0.1309 133 6.963 - - - -
0.1319 134 3.9474 - - - -
0.1329 135 4.3437 - - - -
0.1339 136 3.6267 - - - -
0.1348 137 3.9896 - - - -
0.1358 138 4.8156 - - - -
0.1368 139 4.9751 - - - -
0.1378 140 4.4144 - - - -
0.1388 141 4.7213 - - - -
0.1398 142 6.6081 - - - -
0.1407 143 4.2929 - - - -
0.1417 144 4.2537 - - - -
0.1427 145 4.0647 - - - -
0.1437 146 3.937 - - - -
0.1447 147 5.6582 - - - -
0.1457 148 4.2648 - - - -
0.1467 149 4.4429 - - - -
0.1476 150 3.6197 - - - -
0.1486 151 3.7953 - - - -
0.1496 152 3.8175 - - - -
0.1506 153 4.5137 3.3210 0.1806 0.3919 0.5750
0.1516 154 4.3528 - - - -
0.1526 155 3.6573 - - - -
0.1535 156 3.5248 - - - -
0.1545 157 3.9275 - - - -
0.1555 158 7.1868 - - - -
0.1565 159 3.6294 - - - -
0.1575 160 3.6886 - - - -
0.1585 161 3.1873 - - - -
0.1594 162 6.1951 - - - -
0.1604 163 3.9747 - - - -
0.1614 164 7.004 - - - -
0.1624 165 4.3221 - - - -
0.1634 166 3.5963 - - - -
0.1644 167 3.1988 - - - -
0.1654 168 3.8236 - - - -
0.1663 169 3.5063 - - - -
0.1673 170 5.9843 - - - -
0.1683 171 5.884 - - - -
0.1693 172 4.1317 - - - -
0.1703 173 3.9255 - - - -
0.1713 174 4.1121 - - - -
0.1722 175 3.7748 - - - -
0.1732 176 5.1602 - - - -
0.1742 177 4.8807 - - - -
0.1752 178 3.4643 - - - -
0.1762 179 3.4937 - - - -
0.1772 180 5.2731 - - - -
0.1781 181 4.6416 - - - -
0.1791 182 3.5226 - - - -
0.1801 183 4.7794 - - - -
0.1811 184 3.8504 - - - -
0.1821 185 3.5391 - - - -
0.1831 186 4.0291 - - - -
0.1841 187 3.5606 - - - -
0.1850 188 3.8957 - - - -
0.1860 189 4.3657 - - - -
0.1870 190 5.0173 - - - -
0.1880 191 4.3915 - - - -
0.1890 192 3.4613 - - - -
0.1900 193 3.2005 - - - -
0.1909 194 3.3986 - - - -
0.1919 195 3.7937 - - - -
0.1929 196 3.8981 - - - -
0.1939 197 3.7051 - - - -
0.1949 198 3.8028 - - - -
0.1959 199 3.3294 - - - -
0.1969 200 4.1252 - - - -
0.1978 201 4.2564 - - - -
0.1988 202 3.8258 - - - -
0.1998 203 3.1025 - - - -
0.2008 204 3.5038 - - - -
0.2018 205 3.6021 - - - -
0.2028 206 3.7637 - - - -
0.2037 207 3.2563 - - - -
0.2047 208 3.9323 - - - -
0.2057 209 3.489 - - - -
0.2067 210 3.6549 - - - -
0.2077 211 3.1609 - - - -
0.2087 212 3.2467 - - - -
0.2096 213 3.4514 - - - -
0.2106 214 3.4945 - - - -
0.2116 215 3.5932 - - - -
0.2126 216 3.2289 - - - -
0.2136 217 3.3279 - - - -
0.2146 218 3.8141 - - - -
0.2156 219 3.1171 - - - -
0.2165 220 3.6287 - - - -
0.2175 221 3.8517 - - - -
0.2185 222 3.3836 - - - -
0.2195 223 3.425 - - - -
0.2205 224 3.6246 - - - -
0.2215 225 3.5682 - - - -
0.2224 226 3.3034 - - - -
0.2234 227 3.9251 - - - -
0.2244 228 3.146 - - - -
0.2254 229 3.8859 - - - -
0.2264 230 3.2977 - - - -
0.2274 231 3.2664 - - - -
0.2283 232 3.1275 - - - -
0.2293 233 3.2408 - - - -
0.2303 234 2.907 - - - -
0.2313 235 2.9178 - - - -
0.2323 236 3.324 - - - -
0.2333 237 2.9172 - - - -
0.2343 238 3.4324 - - - -
0.2352 239 4.0563 - - - -
0.2362 240 2.8736 - - - -
0.2372 241 4.7174 - - - -
0.2382 242 3.2025 - - - -
0.2392 243 2.7835 - - - -
0.2402 244 4.3158 - - - -
0.2411 245 2.8619 - - - -
0.2421 246 2.5156 - - - -
0.2431 247 3.2144 - - - -
0.2441 248 3.5927 - - - -
0.2451 249 2.6059 - - - -
0.2461 250 2.9758 - - - -
0.2470 251 3.9214 - - - -
0.2480 252 3.2892 - - - -
0.2490 253 2.9503 - - - -
0.25 254 2.5969 - - - -
0.2510 255 2.9908 - - - -
0.2520 256 2.8995 - - - -
0.2530 257 3.124 - - - -
0.2539 258 3.1197 - - - -
0.2549 259 2.3073 - - - -
0.2559 260 2.8441 - - - -
0.2569 261 1.9788 - - - -
0.2579 262 2.1442 - - - -
0.2589 263 4.9015 - - - -
0.2598 264 2.7866 - - - -
0.2608 265 2.4588 - - - -
0.2618 266 2.3909 - - - -
0.2628 267 4.7394 - - - -
0.2638 268 3.1581 - - - -
0.2648 269 3.973 - - - -
0.2657 270 4.1565 - - - -
0.2667 271 2.5183 - - - -
0.2677 272 3.614 - - - -
0.2687 273 2.6858 - - - -
0.2697 274 3.1182 - - - -
0.2707 275 2.9628 - - - -
0.2717 276 2.8376 - - - -
0.2726 277 2.7858 - - - -
0.2736 278 2.1037 - - - -
0.2746 279 3.0436 - - - -
0.2756 280 3.4125 - - - -
0.2766 281 2.5027 - - - -
0.2776 282 2.7922 - - - -
0.2785 283 2.9762 - - - -
0.2795 284 2.6458 - - - -
0.2805 285 2.962 - - - -
0.2815 286 2.5439 - - - -
0.2825 287 2.8437 - - - -
0.2835 288 3.2134 - - - -
0.2844 289 2.5655 - - - -
0.2854 290 2.9465 - - - -
0.2864 291 2.4653 - - - -
0.2874 292 3.1467 - - - -
0.2884 293 2.6551 - - - -
0.2894 294 2.5098 - - - -
0.2904 295 2.5988 - - - -
0.2913 296 3.778 - - - -
0.2923 297 2.6257 - - - -
0.2933 298 2.5142 - - - -
0.2943 299 2.3182 - - - -
0.2953 300 3.3505 - - - -
0.2963 301 2.9615 - - - -
0.2972 302 2.9136 - - - -
0.2982 303 2.6192 - - - -
0.2992 304 2.3255 - - - -
0.3002 305 2.7168 - - - -

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.2.1
  • Transformers: 4.44.2
  • PyTorch: 2.5.0+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

GISTEmbedLoss

@misc{solatorio2024gistembed,
    title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
    author={Aivin V. Solatorio},
    year={2024},
    eprint={2402.16829},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}