bobox commited on
Commit
91d7f91
1 Parent(s): 89fdd68

Training in progress, epoch 1, checkpoint

Browse files
last-checkpoint/README.md CHANGED
@@ -7,11 +7,12 @@ tags:
7
  - sentence-similarity
8
  - feature-extraction
9
  - generated_from_trainer
10
- - dataset_size:131566
11
  - loss:GISTEmbedLoss
12
  - loss:CoSENTLoss
13
  - loss:OnlineContrastiveLoss
14
  - loss:MultipleNegativesSymmetricRankingLoss
 
15
  base_model: microsoft/deberta-v3-small
16
  datasets:
17
  - sentence-transformers/all-nli
@@ -24,11 +25,21 @@ datasets:
24
  - allenai/sciq
25
  - allenai/qasc
26
  - allenai/openbookqa
27
- - sentence-transformers/msmarco-msmarco-distilbert-base-v3
28
  - sentence-transformers/natural-questions
29
  - sentence-transformers/trivia-qa
30
  - sentence-transformers/quora-duplicates
31
  - sentence-transformers/gooaq
 
 
 
 
 
 
 
 
 
 
 
32
  widget:
33
  - source_sentence: A man in a Santa Claus costume is sitting on a wooden chair holding
34
  a microphone and a stringed instrument.
@@ -70,11 +81,51 @@ widget:
70
  on account of his participation in same-sex union ceremonies.
71
  - Tesla was the fourth of five children.
72
  pipeline_tag: sentence-similarity
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  ---
74
 
75
  # SentenceTransformer based on microsoft/deberta-v3-small
76
 
77
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli), [sts-label](https://huggingface.co/datasets/sentence-transformers/stsb), [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc), [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue), [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail), [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail), [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum), [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression), [sciq_pairs](https://huggingface.co/datasets/allenai/sciq), [qasc_pairs](https://huggingface.co/datasets/allenai/qasc), [openbookqa_pairs](https://huggingface.co/datasets/allenai/openbookqa), [msmarco_pairs](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3), [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions), [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa), [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) and [gooaq_pairs](https://huggingface.co/datasets/sentence-transformers/gooaq) datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
78
 
79
  ## Model Details
80
 
@@ -96,7 +147,7 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [m
96
  - [sciq_pairs](https://huggingface.co/datasets/allenai/sciq)
97
  - [qasc_pairs](https://huggingface.co/datasets/allenai/qasc)
98
  - [openbookqa_pairs](https://huggingface.co/datasets/allenai/openbookqa)
99
- - [msmarco_pairs](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3)
100
  - [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions)
101
  - [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa)
102
  - [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates)
@@ -175,6 +226,27 @@ You can finetune this model on your own dataset.
175
  *List how the model may foreseeably be misused and address what users ought not to do with the model.*
176
  -->
177
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
178
  <!--
179
  ## Bias, Risks and Limitations
180
 
@@ -194,7 +266,7 @@ You can finetune this model on your own dataset.
194
  #### nli-pairs
195
 
196
  * Dataset: [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
197
- * Size: 10,000 training samples
198
  * Columns: <code>sentence1</code> and <code>sentence2</code>
199
  * Approximate statistics based on the first 1000 samples:
200
  | | sentence1 | sentence2 |
@@ -210,7 +282,7 @@ You can finetune this model on your own dataset.
210
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
211
  ```json
212
  {'guide': SentenceTransformer(
213
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
214
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
215
  (2): Normalize()
216
  ), 'temperature': 0.05}
@@ -243,23 +315,23 @@ You can finetune this model on your own dataset.
243
  #### vitaminc-pairs
244
 
245
  * Dataset: [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc) at [be6febb](https://huggingface.co/datasets/tals/vitaminc/tree/be6febb761b0b2807687e61e0b5282e459df2fa0)
246
- * Size: 4,943 training samples
247
  * Columns: <code>label</code>, <code>sentence1</code>, and <code>sentence2</code>
248
  * Approximate statistics based on the first 1000 samples:
249
  | | label | sentence1 | sentence2 |
250
  |:--------|:-----------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
251
  | type | int | string | string |
252
- | details | <ul><li>1: 100.00%</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 16.05 tokens</li><li>max: 93 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 37.61 tokens</li><li>max: 502 tokens</li></ul> |
253
  * Samples:
254
- | label | sentence1 | sentence2 |
255
- |:---------------|:-------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
256
- | <code>1</code> | <code>Google used Motorola as a contract manufacturer .</code> | <code>As such , unlike the Nexus device marketed as being a Google product ; although the company used Motorola as a contract manufacturer , Google has stated that the Pixel is not based on any existing HTC device .</code> |
257
- | <code>1</code> | <code>Based on 91 reviews , the film scored above 39 % .</code> | <code>On Rotten Tomatoes , the film has a rating of 40 % , based on 91 reviews , with an average rating of 4.8/10 .</code> |
258
- | <code>1</code> | <code>Based on more than 26 reviews , the movie scored more than 24.5 %</code> | <code>On Rotten Tomatoes , the film has a rating of 25 % , based on 28 reviews , with an average rating of 3.9/10 .</code> |
259
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
260
  ```json
261
  {'guide': SentenceTransformer(
262
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
263
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
264
  (2): Normalize()
265
  ), 'temperature': 0.05}
@@ -268,41 +340,41 @@ You can finetune this model on your own dataset.
268
  #### qnli-contrastive
269
 
270
  * Dataset: [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue) at [bcdcba7](https://huggingface.co/datasets/nyu-mll/glue/tree/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
271
- * Size: 10,000 training samples
272
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
273
  * Approximate statistics based on the first 1000 samples:
274
  | | sentence1 | sentence2 | label |
275
  |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------|
276
  | type | string | string | int |
277
- | details | <ul><li>min: 6 tokens</li><li>mean: 13.82 tokens</li><li>max: 31 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 35.06 tokens</li><li>max: 201 tokens</li></ul> | <ul><li>0: 100.00%</li></ul> |
278
  * Samples:
279
- | sentence1 | sentence2 | label |
280
- |:------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
281
- | <code>What does LE stand for?</code> | <code>Life expectancy at birth</code> | <code>0</code> |
282
- | <code>For how long was the interest rate of Sumerian loans consistent?</code> | <code>They were denominated in barley or other crops and the interest rate was typically much higher than for commercial loans and could amount to 1/3 to 1/2 of the loan principal.</code> | <code>0</code> |
283
- | <code>What was John's nickname?</code> | <code>During John's early years, Henry attempted to resolve the question of his succession.</code> | <code>0</code> |
284
  * Loss: [<code>OnlineContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#onlinecontrastiveloss)
285
 
286
  #### scitail-pairs-qa
287
 
288
  * Dataset: [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail) at [0cc4353](https://huggingface.co/datasets/allenai/scitail/tree/0cc4353235b289165dfde1c7c5d1be983f99ce44)
289
- * Size: 6,595 training samples
290
  * Columns: <code>sentence2</code> and <code>sentence1</code>
291
  * Approximate statistics based on the first 1000 samples:
292
  | | sentence2 | sentence1 |
293
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
294
  | type | string | string |
295
- | details | <ul><li>min: 7 tokens</li><li>mean: 15.84 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 14.79 tokens</li><li>max: 41 tokens</li></ul> |
296
  * Samples:
297
- | sentence2 | sentence1 |
298
- |:----------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------|
299
- | <code>Light with the longest wavelengths is called infrared light.</code> | <code>Light with the longest wavelengths is called what?</code> |
300
- | <code>Four valence electrons can be found in a carbon atom.</code> | <code>How many valence electrons can be found in a carbon atom?</code> |
301
- | <code>The spines of a cactus help it survive because spines protect the cactus from animals.</code> | <code>How do the spines of a cactus help it survive?</code> |
302
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
303
  ```json
304
  {'guide': SentenceTransformer(
305
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
306
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
307
  (2): Normalize()
308
  ), 'temperature': 0.05}
@@ -311,23 +383,23 @@ You can finetune this model on your own dataset.
311
  #### scitail-pairs-pos
312
 
313
  * Dataset: [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail) at [0cc4353](https://huggingface.co/datasets/allenai/scitail/tree/0cc4353235b289165dfde1c7c5d1be983f99ce44)
314
- * Size: 3,405 training samples
315
  * Columns: <code>sentence1</code> and <code>sentence2</code>
316
  * Approximate statistics based on the first 1000 samples:
317
  | | sentence1 | sentence2 |
318
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
319
  | type | string | string |
320
- | details | <ul><li>min: 8 tokens</li><li>mean: 23.95 tokens</li><li>max: 61 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 15.25 tokens</li><li>max: 36 tokens</li></ul> |
321
  * Samples:
322
- | sentence1 | sentence2 |
323
- |:------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
324
- | <code>The cell cycle is composed of four stages.</code> | <code>Cells have four cycles.</code> |
325
- | <code>Plants exhale Oxygen gas in order for animals to breathe.</code> | <code>Oxygen gas is given off by plants.</code> |
326
- | <code>S-phase (synthesis phase) is the part of the cell cycle in which DNA is replicated, occurring between G1 phase and G2 phase.</code> | <code>During the synthesis phase in the cell cycle, dna replication occurs.</code> |
327
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
328
  ```json
329
  {'guide': SentenceTransformer(
330
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
331
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
332
  (2): Normalize()
333
  ), 'temperature': 0.05}
@@ -336,19 +408,19 @@ You can finetune this model on your own dataset.
336
  #### xsum-pairs
337
 
338
  * Dataset: [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum) at [788ddaf](https://huggingface.co/datasets/sentence-transformers/xsum/tree/788ddafe04e539956d56b567bc32a036ee7b9206)
339
- * Size: 10,000 training samples
340
  * Columns: <code>sentence1</code> and <code>sentence2</code>
341
  * Approximate statistics based on the first 1000 samples:
342
- | | sentence1 | sentence2 |
343
- |:--------|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
344
- | type | string | string |
345
- | details | <ul><li>min: 38 tokens</li><li>mean: 356.45 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 27.24 tokens</li><li>max: 73 tokens</li></ul> |
346
  * Samples:
347
- | sentence1 | sentence2 |
348
- |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
349
- | <code>London Mayor Sadiq Khan said the Central and Victoria lines will start on 19 August, with the Piccadilly, Jubilee and Northern lines following in the autumn.<br>The service will run through the night on Fridays and Saturdays.<br>It was due to start last September but was delayed due to disputes with unions.<br>Last year union members went on strike over the introduction of the all-night service, as well as their pay and conditions.<br>The service will run:<br>Maintenance workers belonging to the RMT union are still in dispute with Transport for London. The union agreed a deal for its drivers in March, but said the deal for maintenance staff was "inferior".<br>About 200 part-time drivers are currently taking part in a 14-week training programme for the new service.<br>RMT general secretary Mick Cash said there were "major" unresolved issues to do with conditions and pensions, and that it still had major concerns over safety.<br>"Against a background of massive cuts overshadowing TfL budgets all parties have to be clear that Night Tube, a development that RMT supports, cannot be delivered on the cheap," he said.<br>The mayor said: "The Night Tube is absolutely vital to my plans to support and grow London's night-time economy - creating more jobs and opportunities for all Londoners.<br>"The constant delays under the previous mayor let Londoners down badly. I have made getting the Night Tube up and running a priority."<br>London's transport commissioner, Mike Brown, said: "More than half a million people use the Tube after 10pm on Fridays and Saturdays, and the introduction of the Night Tube, which will support London's businesses and jobs, is a historic step in our modernisation of the Underground and our work to support London's economic growth."<br>Of all of the previous mayor Boris Johnson's schemes, the Night Tube was the most problematic to get off the ground.<br>I once called it a zombie policy - half dead wandering around, causing nothing but trouble.<br>It was meant to start in September 2015 but the transport unions were not happy about pay and changes to their work life balance.<br>The eventual solution - as well as bonuses - was to hire in part-time night drivers.<br>Now we have a start date (again) but crucially it is only for the Central and Victoria lines. That's because there are still unresolved issues with RMT staff who work mainly on the other planned night lines - the Jubilee, the Northern and the Piccadilly.<br>By going ahead with a start anyway there is a risk that antagonises the unions. It could be the first big test for the new mayor's relationship with London's transport unions.<br>Until it's running on five lines, there are probably more twists to come.</code> | <code>A date has been set for the launch of the Night Tube service - almost a year after it was first scheduled to begin.</code> |
350
- | <code>Police knew that the suspect, Jose Jorge Balderas Garza, was in a relationship with a Colombian model.<br>Reports say when a Facebook profile in her name listed a Mexico City area as her location, officers moved in.<br>Mr Balderas denies he carried out the shooting.<br>He blames one of his associates for the attack last January on Cabanas, who played for Paraguay and Mexico's Club America.<br>The football star was shot in the head in the bathroom of a bar in Mexico City on 25 January last year. He survived, but a bullet remains lodged in his head.<br>Police also accuse Mr Balderas of running a drug-trafficking ring.<br>Officers say that during their inquiries about Mr Balderas, they became aware of his romantic link to the Colombian model and participant in the Miss Antioquia 2008 competition, Juliana Sossa.<br>A profile page in Ms Sossa's name on Facebook gave her current location as Lomas de Chapultepec, Mexico City.<br>On Tuesday, police moved into the area and found Ms Sossa, 25, and Mr Balderas in the house they shared. They arrested the couple, along with five other suspects.</code> | <code>Media in Mexico say a post on the social networking site Facebook helped lead police to the main suspect in the shooting of the Paraguayan footballer Salvador Cabanas.</code> |
351
- | <code>The Home Affairs Select Committee was told about "apparent corruption right at the heart of New Scotland Yard".<br>Officers investigating Nigerian fraudster James Ibori were accused of taking cash payments for information.<br>The former inspector has denied any wrong-doing and the two serving officers have declined to comment.<br>The allegations surfaced during a parliamentary hearing on the role of private detectives.<br>During evidence from lawyer Mike Schwarz from solicitors Bindmans, MPs were told of documents which allege that private investigation firm, Risc Management Ltd, was involved in "wining and dining and paying" officers working on the James Ibori case.<br>Ibori was the former state governor of the oil-rich Delta region in Nigeria, a corrupt official who stole hundreds of millions of pounds from his homeland. He was sentenced to 13 years imprisonment last month after pleading guilty to laundering millions of pounds in the UK.<br>Mr Schwarz, representing Ibori's London lawyer who was jailed as part of the case, said: "The key culprits appear to be the key players who are the senior investigating officer, DI Gary Walters, and two of the key investigators who are DC John McDonald and DC (Peter) Clark."<br>How a thief almost became Nigeria's president<br>Ex-governor jailed for £50m fraud<br>Mr Schwarz told the committee there were records that "show about half a dozen payments totalling £20,000 over eight or nine months."<br>The allegations were originally made in an anonymous bundle of documents sent to former Metropolitan Police Commissioner Sir Paul Stephenson and the Independent Police Complaints Commission (IPCC) last summer.<br>In October last year, the IPCC instructed the Metropolitan Police's Directorate of Professional Standards (DPS) to conduct an internal investigation.<br>The paperwork included what purported to be detailed invoices and expense ledgers from Risc Management Ltd, headed at the time by two former Scotland Yard detectives, Keith Hunter and Cliff Knuckey.<br>Among the entries in the documents are details of what were said to be payments made to sources for confidential information about the on-going police investigation into Ibori.<br>One entry, dated shortly before police were due to interview James Ibori's London solicitor, reads: "Engaged with source in eliciting information re: forthcoming interviewing strategy to de (sic) deployed by police."<br>Immediately below, the entry states: "Cash payment made to above source for information provided. £5,000.00."<br>The DPS has said it has "an open mind" as to whether the documents are genuine or an elaborate forgery designed to pervert the course of justice.<br>Mr Schwarz criticised the internal inquiry into the corruption allegations as having "huge failings".<br>"Two of the key officers are still on duty on the same case and one has retired and joined Risc Management," he told the committee.<br>In a statement the Metropolitan Police said: "The MPS is investigating an allegation that illegal payments were made to police officers for information by a private investigation agency.<br>"The DPS referred the matter to the IPCC in October 2011 which agreed to supervise a DPS investigation into the allegations.<br>"This is an ongoing investigation and it would be inappropriate to comment further at this stage whilst the investigation is under way."<br>The BBC has confirmed that in the seven months since the DPS inquiry was launched, neither Risc Management, nor the law firm who hired them on behalf of James Ibori, have been contacted. No police officer has been asked about the allegations.<br>Already under fire for not properly investigating allegations of phone hacking, and with officers facing allegations they accepted cash from News International journalists, the new claims heap further pressure on the Metropolitan Police.<br>It is not possible to be certain whether the documents at the heart of the corruption allegations are genuine or elaborate fakes nor whether corrupt payments were actually made.<br>Risc Management denies it has ever paid money to any police officer.</code> | <code>Two Scotland Yard detective constables and a former detective inspector have been named as "key culprits" in bribery allegations revealed to MPs.</code> |
352
  * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
353
  ```json
354
  {
@@ -360,7 +432,7 @@ You can finetune this model on your own dataset.
360
  #### compression-pairs
361
 
362
  * Dataset: [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression) at [605bc91](https://huggingface.co/datasets/sentence-transformers/sentence-compression/tree/605bc91d95631895ba25b6eda51a3cb596976c90)
363
- * Size: 10,000 training samples
364
  * Columns: <code>sentence1</code> and <code>sentence2</code>
365
  * Approximate statistics based on the first 1000 samples:
366
  | | sentence1 | sentence2 |
@@ -384,7 +456,7 @@ You can finetune this model on your own dataset.
384
  #### sciq_pairs
385
 
386
  * Dataset: [sciq_pairs](https://huggingface.co/datasets/allenai/sciq) at [2c94ad3](https://huggingface.co/datasets/allenai/sciq/tree/2c94ad3e1aafab77146f384e23536f97a4849815)
387
- * Size: 10,000 training samples
388
  * Columns: <code>sentence1</code> and <code>sentence2</code>
389
  * Approximate statistics based on the first 1000 samples:
390
  | | sentence1 | sentence2 |
@@ -400,7 +472,7 @@ You can finetune this model on your own dataset.
400
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
401
  ```json
402
  {'guide': SentenceTransformer(
403
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
404
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
405
  (2): Normalize()
406
  ), 'temperature': 0.05}
@@ -425,7 +497,7 @@ You can finetune this model on your own dataset.
425
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
426
  ```json
427
  {'guide': SentenceTransformer(
428
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
429
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
430
  (2): Normalize()
431
  ), 'temperature': 0.05}
@@ -450,7 +522,7 @@ You can finetune this model on your own dataset.
450
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
451
  ```json
452
  {'guide': SentenceTransformer(
453
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
454
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
455
  (2): Normalize()
456
  ), 'temperature': 0.05}
@@ -458,33 +530,26 @@ You can finetune this model on your own dataset.
458
 
459
  #### msmarco_pairs
460
 
461
- * Dataset: [msmarco_pairs](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3) at [28ff31e](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3/tree/28ff31e4c97cddd53d298497f766e653f1e666f9)
462
- * Size: 10,000 training samples
463
- * Columns: <code>sentence1</code> and <code>sentence2</code>
464
  * Approximate statistics based on the first 1000 samples:
465
- | | sentence1 | sentence2 |
466
- |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
467
- | type | string | string |
468
- | details | <ul><li>min: 4 tokens</li><li>mean: 8.61 tokens</li><li>max: 27 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 75.09 tokens</li><li>max: 206 tokens</li></ul> |
469
  * Samples:
470
- | sentence1 | sentence2 |
471
- |:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
472
- | <code>what are the liberal arts?</code> | <code>liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects.</code> |
473
- | <code>what is the mechanism of action of fibrinolytic or thrombolytic drugs?</code> | <code>Baillière's Clinical Haematology. 6 Mechanism of action of the thrombolytic agents. 6 Mechanism of action of the thrombolytic agents JEFFREY I. WEITZ Fibrin formed during the haemostatic, inflammatory or tissue repair process serves a temporary role, and must be degraded to restore normal tissue function and structure.</code> |
474
- | <code>what is normal plat count</code> | <code>78 Followers. A. Platelets are the tiny blood cells that help stop bleeding by binding together to form a clump or plug at sites of injury inside blood vessels. A normal platelet count is between 150,000 and 450,000 platelets per microliter (one-millionth of a liter, abbreviated mcL).The average platelet count is 237,000 per mcL in men and 266,000 per mcL in women.8 Followers. A. Platelets are the tiny blood cells that help stop bleeding by binding together to form a clump or plug at sites of injury inside blood vessels. A normal platelet count is between 150,000 and 450,000 platelets per microliter (one-millionth of a liter, abbreviated mcL).</code> |
475
- * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
476
- ```json
477
- {'guide': SentenceTransformer(
478
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
479
- (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
480
- (2): Normalize()
481
- ), 'temperature': 0.05}
482
- ```
483
 
484
  #### nq_pairs
485
 
486
  * Dataset: [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions) at [f9e894e](https://huggingface.co/datasets/sentence-transformers/natural-questions/tree/f9e894e1081e206e577b4eaa9ee6de2b06ae6f17)
487
- * Size: 10,000 training samples
488
  * Columns: <code>sentence1</code> and <code>sentence2</code>
489
  * Approximate statistics based on the first 1000 samples:
490
  | | sentence1 | sentence2 |
@@ -500,7 +565,7 @@ You can finetune this model on your own dataset.
500
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
501
  ```json
502
  {'guide': SentenceTransformer(
503
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
504
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
505
  (2): Normalize()
506
  ), 'temperature': 0.05}
@@ -509,7 +574,7 @@ You can finetune this model on your own dataset.
509
  #### trivia_pairs
510
 
511
  * Dataset: [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa) at [a7c36e3](https://huggingface.co/datasets/sentence-transformers/trivia-qa/tree/a7c36e3c8c8c01526bc094d79bf80d4c848b0ad0)
512
- * Size: 10,000 training samples
513
  * Columns: <code>sentence1</code> and <code>sentence2</code>
514
  * Approximate statistics based on the first 1000 samples:
515
  | | sentence1 | sentence2 |
@@ -525,7 +590,7 @@ You can finetune this model on your own dataset.
525
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
526
  ```json
527
  {'guide': SentenceTransformer(
528
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
529
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
530
  (2): Normalize()
531
  ), 'temperature': 0.05}
@@ -534,7 +599,7 @@ You can finetune this model on your own dataset.
534
  #### quora_pairs
535
 
536
  * Dataset: [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) at [451a485](https://huggingface.co/datasets/sentence-transformers/quora-duplicates/tree/451a4850bd141edb44ade1b5828c259abd762cdb)
537
- * Size: 10,000 training samples
538
  * Columns: <code>sentence1</code> and <code>sentence2</code>
539
  * Approximate statistics based on the first 1000 samples:
540
  | | sentence1 | sentence2 |
@@ -558,7 +623,7 @@ You can finetune this model on your own dataset.
558
  #### gooaq_pairs
559
 
560
  * Dataset: [gooaq_pairs](https://huggingface.co/datasets/sentence-transformers/gooaq) at [b089f72](https://huggingface.co/datasets/sentence-transformers/gooaq/tree/b089f728748a068b7bc5234e5bcf5b25e3c8279c)
561
- * Size: 10,000 training samples
562
  * Columns: <code>sentence1</code> and <code>sentence2</code>
563
  * Approximate statistics based on the first 1000 samples:
564
  | | sentence1 | sentence2 |
@@ -574,7 +639,7 @@ You can finetune this model on your own dataset.
574
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
575
  ```json
576
  {'guide': SentenceTransformer(
577
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
578
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
579
  (2): Normalize()
580
  ), 'temperature': 0.05}
@@ -601,7 +666,7 @@ You can finetune this model on your own dataset.
601
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
602
  ```json
603
  {'guide': SentenceTransformer(
604
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
605
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
606
  (2): Normalize()
607
  ), 'temperature': 0.05}
@@ -626,7 +691,7 @@ You can finetune this model on your own dataset.
626
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
627
  ```json
628
  {'guide': SentenceTransformer(
629
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
630
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
631
  (2): Normalize()
632
  ), 'temperature': 0.05}
@@ -656,11 +721,11 @@ You can finetune this model on your own dataset.
656
  - `eval_strategy`: steps
657
  - `per_device_train_batch_size`: 28
658
  - `per_device_eval_batch_size`: 16
659
- - `learning_rate`: 2e-05
660
  - `weight_decay`: 1e-10
661
  - `num_train_epochs`: 2
662
  - `lr_scheduler_type`: cosine
663
- - `warmup_ratio`: 0.33
664
  - `save_safetensors`: False
665
  - `fp16`: True
666
  - `push_to_hub`: True
@@ -681,7 +746,7 @@ You can finetune this model on your own dataset.
681
  - `per_gpu_eval_batch_size`: None
682
  - `gradient_accumulation_steps`: 1
683
  - `eval_accumulation_steps`: None
684
- - `learning_rate`: 2e-05
685
  - `weight_decay`: 1e-10
686
  - `adam_beta1`: 0.9
687
  - `adam_beta2`: 0.999
@@ -691,7 +756,7 @@ You can finetune this model on your own dataset.
691
  - `max_steps`: -1
692
  - `lr_scheduler_type`: cosine
693
  - `lr_scheduler_kwargs`: {}
694
- - `warmup_ratio`: 0.33
695
  - `warmup_steps`: 0
696
  - `log_level`: passive
697
  - `log_level_replica`: warning
@@ -783,19 +848,18 @@ You can finetune this model on your own dataset.
783
  </details>
784
 
785
  ### Training Logs
786
- | Epoch | Step | Training Loss | scitail-pairs-pos loss | qnli-contrastive loss | nli-pairs loss |
787
- |:-----:|:----:|:-------------:|:----------------------:|:---------------------:|:--------------:|
788
- | None | 0 | - | 3.4480 | 4.1500 | 4.2865 |
789
- | 0.1 | 471 | 4.4848 | 2.4697 | 3.3142 | 3.2277 |
790
- | 0.2 | 942 | 2.6358 | 0.9157 | 2.6632 | 1.5920 |
791
- | 0.3 | 1413 | 1.7183 | 0.7445 | 2.1308 | 1.1537 |
792
- | 0.4 | 1884 | 1.6114 | 0.6194 | 1.6952 | 0.8992 |
793
- | 0.5 | 2355 | 1.5367 | 0.6661 | 0.8698 | 0.8112 |
794
- | 0.6 | 2826 | 1.1657 | 0.5583 | 0.8415 | 0.7330 |
795
- | 0.7 | 3297 | 1.2926 | 0.5284 | 0.5240 | 0.6883 |
796
- | 0.8 | 3768 | 1.1523 | 0.4816 | 0.4342 | 0.6776 |
797
- | 0.9 | 4239 | 1.0387 | 0.4603 | 0.3022 | 0.6213 |
798
- | 1.0 | 4710 | 1.0356 | 0.5449 | 0.1294 | 0.6489 |
799
 
800
 
801
  ### Framework Versions
@@ -847,6 +911,18 @@ You can finetune this model on your own dataset.
847
  }
848
  ```
849
 
 
 
 
 
 
 
 
 
 
 
 
 
850
  <!--
851
  ## Glossary
852
 
 
7
  - sentence-similarity
8
  - feature-extraction
9
  - generated_from_trainer
10
+ - dataset_size:526885
11
  - loss:GISTEmbedLoss
12
  - loss:CoSENTLoss
13
  - loss:OnlineContrastiveLoss
14
  - loss:MultipleNegativesSymmetricRankingLoss
15
+ - loss:MarginMSELoss
16
  base_model: microsoft/deberta-v3-small
17
  datasets:
18
  - sentence-transformers/all-nli
 
25
  - allenai/sciq
26
  - allenai/qasc
27
  - allenai/openbookqa
 
28
  - sentence-transformers/natural-questions
29
  - sentence-transformers/trivia-qa
30
  - sentence-transformers/quora-duplicates
31
  - sentence-transformers/gooaq
32
+ metrics:
33
+ - pearson_cosine
34
+ - spearman_cosine
35
+ - pearson_manhattan
36
+ - spearman_manhattan
37
+ - pearson_euclidean
38
+ - spearman_euclidean
39
+ - pearson_dot
40
+ - spearman_dot
41
+ - pearson_max
42
+ - spearman_max
43
  widget:
44
  - source_sentence: A man in a Santa Claus costume is sitting on a wooden chair holding
45
  a microphone and a stringed instrument.
 
81
  on account of his participation in same-sex union ceremonies.
82
  - Tesla was the fourth of five children.
83
  pipeline_tag: sentence-similarity
84
+ model-index:
85
+ - name: SentenceTransformer based on microsoft/deberta-v3-small
86
+ results:
87
+ - task:
88
+ type: semantic-similarity
89
+ name: Semantic Similarity
90
+ dataset:
91
+ name: sts test
92
+ type: sts-test
93
+ metrics:
94
+ - type: pearson_cosine
95
+ value: 0.2520910673470529
96
+ name: Pearson Cosine
97
+ - type: spearman_cosine
98
+ value: 0.2588662067006675
99
+ name: Spearman Cosine
100
+ - type: pearson_manhattan
101
+ value: 0.30439718484055006
102
+ name: Pearson Manhattan
103
+ - type: spearman_manhattan
104
+ value: 0.3013780326567434
105
+ name: Spearman Manhattan
106
+ - type: pearson_euclidean
107
+ value: 0.25977707672353506
108
+ name: Pearson Euclidean
109
+ - type: spearman_euclidean
110
+ value: 0.26078444276128726
111
+ name: Spearman Euclidean
112
+ - type: pearson_dot
113
+ value: 0.08121075567918108
114
+ name: Pearson Dot
115
+ - type: spearman_dot
116
+ value: 0.0753891417253212
117
+ name: Spearman Dot
118
+ - type: pearson_max
119
+ value: 0.30439718484055006
120
+ name: Pearson Max
121
+ - type: spearman_max
122
+ value: 0.3013780326567434
123
+ name: Spearman Max
124
  ---
125
 
126
  # SentenceTransformer based on microsoft/deberta-v3-small
127
 
128
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli), [sts-label](https://huggingface.co/datasets/sentence-transformers/stsb), [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc), [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue), [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail), [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail), [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum), [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression), [sciq_pairs](https://huggingface.co/datasets/allenai/sciq), [qasc_pairs](https://huggingface.co/datasets/allenai/qasc), [openbookqa_pairs](https://huggingface.co/datasets/allenai/openbookqa), msmarco_pairs, [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions), [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa), [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) and [gooaq_pairs](https://huggingface.co/datasets/sentence-transformers/gooaq) datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
129
 
130
  ## Model Details
131
 
 
147
  - [sciq_pairs](https://huggingface.co/datasets/allenai/sciq)
148
  - [qasc_pairs](https://huggingface.co/datasets/allenai/qasc)
149
  - [openbookqa_pairs](https://huggingface.co/datasets/allenai/openbookqa)
150
+ - msmarco_pairs
151
  - [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions)
152
  - [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa)
153
  - [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates)
 
226
  *List how the model may foreseeably be misused and address what users ought not to do with the model.*
227
  -->
228
 
229
+ ## Evaluation
230
+
231
+ ### Metrics
232
+
233
+ #### Semantic Similarity
234
+ * Dataset: `sts-test`
235
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
236
+
237
+ | Metric | Value |
238
+ |:--------------------|:-----------|
239
+ | pearson_cosine | 0.2521 |
240
+ | **spearman_cosine** | **0.2589** |
241
+ | pearson_manhattan | 0.3044 |
242
+ | spearman_manhattan | 0.3014 |
243
+ | pearson_euclidean | 0.2598 |
244
+ | spearman_euclidean | 0.2608 |
245
+ | pearson_dot | 0.0812 |
246
+ | spearman_dot | 0.0754 |
247
+ | pearson_max | 0.3044 |
248
+ | spearman_max | 0.3014 |
249
+
250
  <!--
251
  ## Bias, Risks and Limitations
252
 
 
266
  #### nli-pairs
267
 
268
  * Dataset: [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
269
+ * Size: 50,000 training samples
270
  * Columns: <code>sentence1</code> and <code>sentence2</code>
271
  * Approximate statistics based on the first 1000 samples:
272
  | | sentence1 | sentence2 |
 
282
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
283
  ```json
284
  {'guide': SentenceTransformer(
285
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
286
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
287
  (2): Normalize()
288
  ), 'temperature': 0.05}
 
315
  #### vitaminc-pairs
316
 
317
  * Dataset: [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc) at [be6febb](https://huggingface.co/datasets/tals/vitaminc/tree/be6febb761b0b2807687e61e0b5282e459df2fa0)
318
+ * Size: 24,996 training samples
319
  * Columns: <code>label</code>, <code>sentence1</code>, and <code>sentence2</code>
320
  * Approximate statistics based on the first 1000 samples:
321
  | | label | sentence1 | sentence2 |
322
  |:--------|:-----------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
323
  | type | int | string | string |
324
+ | details | <ul><li>1: 100.00%</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 17.18 tokens</li><li>max: 56 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 37.57 tokens</li><li>max: 240 tokens</li></ul> |
325
  * Samples:
326
+ | label | sentence1 | sentence2 |
327
+ |:---------------|:-----------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------|
328
+ | <code>1</code> | <code>Based on 93 reviews , the film has a 95 % approval rating</code> | <code>On review aggregation website Rotten Tomatoes , the film has an approval rating of 95 % , based on 93 reviews , with an average rating of 7.9/10 .</code> |
329
+ | <code>1</code> | <code>Bianca 's ex-husband is Gavin Ellis Ricky Butcher .</code> | <code>Whitney runs away and Bianca 's ex-husband Gavin Ellis Ricky Butcher ( Sid Owen ) finds her drunk .</code> |
330
+ | <code>1</code> | <code>Critics gave Jagga Jasoo ( film ) positive reviews .</code> | <code>The film received positive to reviews from the critics.</code> |
331
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
332
  ```json
333
  {'guide': SentenceTransformer(
334
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
335
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
336
  (2): Normalize()
337
  ), 'temperature': 0.05}
 
340
  #### qnli-contrastive
341
 
342
  * Dataset: [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue) at [bcdcba7](https://huggingface.co/datasets/nyu-mll/glue/tree/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
343
+ * Size: 50,000 training samples
344
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
345
  * Approximate statistics based on the first 1000 samples:
346
  | | sentence1 | sentence2 | label |
347
  |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------|
348
  | type | string | string | int |
349
+ | details | <ul><li>min: 7 tokens</li><li>mean: 13.99 tokens</li><li>max: 64 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 35.78 tokens</li><li>max: 151 tokens</li></ul> | <ul><li>0: 100.00%</li></ul> |
350
  * Samples:
351
+ | sentence1 | sentence2 | label |
352
+ |:-------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
353
+ | <code>How big is Midtown's population?</code> | <code>The Eastern Market farmer's distribution center is the largest open-air flowerbed market in the United States and has more than 150 foods and specialty businesses.</code> | <code>0</code> |
354
+ | <code>How many immigrants lived in these tent cities?</code> | <code>During this period, food, clothes and furniture had to be rationed in what became known as the Austerity Period.</code> | <code>0</code> |
355
+ | <code>What Iranian film festival was created in 1973?</code> | <code>Attempts to organize a film festival that had begun in 1954 within the framework of the Golrizan Festival, bore fruits in the form of the Sepas Festival in 1969.</code> | <code>0</code> |
356
  * Loss: [<code>OnlineContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#onlinecontrastiveloss)
357
 
358
  #### scitail-pairs-qa
359
 
360
  * Dataset: [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail) at [0cc4353](https://huggingface.co/datasets/allenai/scitail/tree/0cc4353235b289165dfde1c7c5d1be983f99ce44)
361
+ * Size: 14,987 training samples
362
  * Columns: <code>sentence2</code> and <code>sentence1</code>
363
  * Approximate statistics based on the first 1000 samples:
364
  | | sentence2 | sentence1 |
365
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
366
  | type | string | string |
367
+ | details | <ul><li>min: 7 tokens</li><li>mean: 15.97 tokens</li><li>max: 36 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 15.01 tokens</li><li>max: 33 tokens</li></ul> |
368
  * Samples:
369
+ | sentence2 | sentence1 |
370
+ |:---------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------|
371
+ | <code>The abundance of water makes the earth habitable for humans.</code> | <code>What makes the earth habitable for humans?</code> |
372
+ | <code>Individual is the term for an organism, or single living thing.</code> | <code>What is the term for an organism, or single living thing?</code> |
373
+ | <code>Ultrasound, a diagnostic technology, uses high-frequency vibrations transmitted into any tissue in contact with the transducer.</code> | <code>What diagnostic technology uses high-frequency vibrations transmitted into any tissue in contact with the transducer?</code> |
374
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
375
  ```json
376
  {'guide': SentenceTransformer(
377
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
378
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
379
  (2): Normalize()
380
  ), 'temperature': 0.05}
 
383
  #### scitail-pairs-pos
384
 
385
  * Dataset: [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail) at [0cc4353](https://huggingface.co/datasets/allenai/scitail/tree/0cc4353235b289165dfde1c7c5d1be983f99ce44)
386
+ * Size: 8,600 training samples
387
  * Columns: <code>sentence1</code> and <code>sentence2</code>
388
  * Approximate statistics based on the first 1000 samples:
389
  | | sentence1 | sentence2 |
390
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
391
  | type | string | string |
392
+ | details | <ul><li>min: 6 tokens</li><li>mean: 23.86 tokens</li><li>max: 59 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 15.69 tokens</li><li>max: 41 tokens</li></ul> |
393
  * Samples:
394
+ | sentence1 | sentence2 |
395
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------|
396
+ | <code>Frost (also called white or hoarfrost) occurs when air temperatures dip below 32F and ice crystals form on the plant leaves, injuring and sometimes killing tender plants.</code> | <code>The ice crystals that form on the ground are called frost.</code> |
397
+ | <code>They are considered micronutrients because the body needs them in relatively small amounts compared with nutrients such as carbohydrates, proteins, fats and water.</code> | <code>Micronutrients is the term for nutrients the body needs in relatively small amounts, including vitamins and minerals.</code> |
398
+ | <code>However cell division goes through a sixth phase called cytokinesis, which is the division of the cytoplasm and the formation of two new daughter cells.</code> | <code>Cytokinesis divides the cytoplasm into two distinctive cells.</code> |
399
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
400
  ```json
401
  {'guide': SentenceTransformer(
402
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
403
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
404
  (2): Normalize()
405
  ), 'temperature': 0.05}
 
408
  #### xsum-pairs
409
 
410
  * Dataset: [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum) at [788ddaf](https://huggingface.co/datasets/sentence-transformers/xsum/tree/788ddafe04e539956d56b567bc32a036ee7b9206)
411
+ * Size: 50,000 training samples
412
  * Columns: <code>sentence1</code> and <code>sentence2</code>
413
  * Approximate statistics based on the first 1000 samples:
414
+ | | sentence1 | sentence2 |
415
+ |:--------|:-------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
416
+ | type | string | string |
417
+ | details | <ul><li>min: 40 tokens</li><li>mean: 337.79 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 26.93 tokens</li><li>max: 75 tokens</li></ul> |
418
  * Samples:
419
+ | sentence1 | sentence2 |
420
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|
421
+ | <code>A Haystack in the Evening Sun had not previously been authenticated because the work is largely unknown and the artist's signature is covered by paint.<br>However researchers at the University of Jyvaskyla in Finland uncovered the signature using a hyperspectral camera.<br>It also revealed the date of the work's creation - 1891.<br>The special camera used by researchers studied the painting's elemental composition by measuring X-ray fluorescence.<br>That allowed them to "see" below the surface, and analyse the materials used to create the work.<br>"The camera is principally operating as a scanner, which scans one line at a time," researcher Ilkka Polonen said.<br>"When the camera is moved using the scanner, an image of the whole picture can be obtained."<br>An analysis of the pigments and canvas fibres also confirmed the painting was by the Impressionist.<br>The artwork is currently owned by Finland's Serlachius Fine Arts Foundation, which acquired it in the 1950s through a London art broker.<br>The institution said the authentication means the artwork is the first Monet painting to be held in a Finnish public collection.</code> | <code>An oil painting thought to have been created by French Impressionist Claude Monet has been proven to be genuine through scientific testing.</code> |
422
+ | <code>Passengers on a British Airways flight from Prague and an Icelandair plane told of their relief after landing safely at Heathrow following the strikes on Wednesday.<br>One described "a white flash" while others said they felt a "crack" and "bang" as bolts hit the aircraft.<br>BA said planes were built to cope with lightning strikes and their jet would be inspected before resuming service.<br>Liz Dobson, a charity worker, told the Evening Standard: "It came out of the blue. There was a really loud bang and a white flash. Not really what you want on a plane.<br>"The lightning hit the wing."<br>Catherine Mayer, who is co-founder of the Women's Equality Party, was returning from Iceland.<br>She tweeted: "The plane got hit by lightning. Big flash and bang. #blimey."<br>She told the BBC how passengers sitting next to her looked distressed and frightened.<br>Icelandair confirmed that flight FI454 had been struck.<br>"The aircraft was of course inspected after landing for safety reasons, and as the lightning did not cause damage, the aircraft was returned to service later last night," said a spokesperson for the airline.<br>A spokesman for BA said: "Lightning strikes are fairly common and aircraft are designed to cope with them."<br>On average, commercial planes are struck by lightning about once a year according to Cardiff University's "lightning lab" in the UK, a recently established laboratory where Airbus conducts lightning tests.</code> | <code>Two planes have been struck by lightning over west London.</code> |
423
+ | <code>Arthur Mellar, 47, died after being seriously injured at Burghley House, on the Lincolnshire-Cambridgeshire border, on 12 July 2014.<br>Peterborough Crown Court heard the lift fell onto Mr Mellar as he tried to free a jammed item of luggage.<br>Burghley House Preservation Trust previously admitted it failed to ensure the welfare of an employee.<br>More on this and other local stories from across Lincolnshire<br>Mr Mellar got caught between the lift cage and the banister of the lift housing as he attempted to dislodge the baggage, the court heard.<br>The Health and Safety Executive, which brought the prosecution against the trust, said it was a "completely avoidable incident".<br>There were no safety measures in place to prevent it and the lift had not been inspected by an engineer since it was installed in the late 1950s, the court heard.<br>The court was also told the trust did not conduct a safety risk assessment on the lift, which was used to transport guests' luggage from different levels of the house.<br>Mr Mellar, from Barnsley, South Yorkshire, had worked at the 16th Century Burghley House for nine years.<br>Judge Sean Enright fined the trust £266,000, along with costs of nearly £17,000.<br>David Pennell, estates director at Burghley House, said: "Health and safety matters have always been paramount across all activities at Burghley and what happened to Arthur Mellar in July 2014 was a dreadful and tragic accident."<br>"Our thoughts are with Gerwin and Arthur's family at this time," he added.<br>The mansion has been used for locations in the films Pride and Prejudice and The Da Vinci Code.</code> | <code>The owners of Tudor stately home have been fined £266,000 after a butler was crushed to death by a faulty lift.</code> |
424
  * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
425
  ```json
426
  {
 
432
  #### compression-pairs
433
 
434
  * Dataset: [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression) at [605bc91](https://huggingface.co/datasets/sentence-transformers/sentence-compression/tree/605bc91d95631895ba25b6eda51a3cb596976c90)
435
+ * Size: 50,000 training samples
436
  * Columns: <code>sentence1</code> and <code>sentence2</code>
437
  * Approximate statistics based on the first 1000 samples:
438
  | | sentence1 | sentence2 |
 
456
  #### sciq_pairs
457
 
458
  * Dataset: [sciq_pairs](https://huggingface.co/datasets/allenai/sciq) at [2c94ad3](https://huggingface.co/datasets/allenai/sciq/tree/2c94ad3e1aafab77146f384e23536f97a4849815)
459
+ * Size: 11,679 training samples
460
  * Columns: <code>sentence1</code> and <code>sentence2</code>
461
  * Approximate statistics based on the first 1000 samples:
462
  | | sentence1 | sentence2 |
 
472
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
473
  ```json
474
  {'guide': SentenceTransformer(
475
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
476
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
477
  (2): Normalize()
478
  ), 'temperature': 0.05}
 
497
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
498
  ```json
499
  {'guide': SentenceTransformer(
500
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
501
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
502
  (2): Normalize()
503
  ), 'temperature': 0.05}
 
522
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
523
  ```json
524
  {'guide': SentenceTransformer(
525
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
526
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
527
  (2): Normalize()
528
  ), 'temperature': 0.05}
 
530
 
531
  #### msmarco_pairs
532
 
533
+ * Dataset: msmarco_pairs
534
+ * Size: 50,000 training samples
535
+ * Columns: <code>query</code>, <code>positive</code>, <code>negative</code>, and <code>label</code>
536
  * Approximate statistics based on the first 1000 samples:
537
+ | | query | positive | negative | label |
538
+ |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------|
539
+ | type | string | string | string | float |
540
+ | details | <ul><li>min: 4 tokens</li><li>mean: 8.61 tokens</li><li>max: 27 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 75.09 tokens</li><li>max: 206 tokens</li></ul> | <ul><li>min: 15 tokens</li><li>mean: 72.59 tokens</li><li>max: 216 tokens</li></ul> | <ul><li>min: -0.5</li><li>mean: 0.04</li><li>max: 0.6</li></ul> |
541
  * Samples:
542
+ | query | positive | negative | label |
543
+ |:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------|
544
+ | <code>what are the liberal arts?</code> | <code>liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects.</code> | <code>The New York State Education Department requires 60 Liberal Arts credits in a Bachelor of Science program and 90 Liberal Arts credits in a Bachelor of Arts program. In the list of course descriptions, courses which are liberal arts for all students are identified by (Liberal Arts) after the course number.</code> | <code>0.12154221534729004</code> |
545
+ | <code>what is the mechanism of action of fibrinolytic or thrombolytic drugs?</code> | <code>Baillière's Clinical Haematology. 6 Mechanism of action of the thrombolytic agents. 6 Mechanism of action of the thrombolytic agents JEFFREY I. WEITZ Fibrin formed during the haemostatic, inflammatory or tissue repair process serves a temporary role, and must be degraded to restore normal tissue function and structure.</code> | <code>Fibrinolytic drug. Fibrinolytic drug, also called thrombolytic drug, any agent that is capable of stimulating the dissolution of a blood clot (thrombus). Fibrinolytic drugs work by activating the so-called fibrinolytic pathway.</code> | <code>-0.05174225568771362</code> |
546
+ | <code>what is normal plat count</code> | <code>78 Followers. A. Platelets are the tiny blood cells that help stop bleeding by binding together to form a clump or plug at sites of injury inside blood vessels. A normal platelet count is between 150,000 and 450,000 platelets per microliter (one-millionth of a liter, abbreviated mcL).The average platelet count is 237,000 per mcL in men and 266,000 per mcL in women.8 Followers. A. Platelets are the tiny blood cells that help stop bleeding by binding together to form a clump or plug at sites of injury inside blood vessels. A normal platelet count is between 150,000 and 450,000 platelets per microliter (one-millionth of a liter, abbreviated mcL).</code> | <code>Your blood test results should be written in your maternity notes. Your platelet count will look something like Plat. 160x10.9/L, which means you have a platelet count of 160, which is in the normal range.If your platelet count is low, the blood test should be done again.This will keep track of whether or not your count is dropping.our platelet count will look something like Plat. 160x10.9/L, which means you have a platelet count of 160, which is in the normal range. If your platelet count is low, the blood test should be done again. This will keep track of whether or not your count is dropping.</code> | <code>-0.037523627281188965</code> |
547
+ * Loss: [<code>MarginMSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#marginmseloss)
 
 
 
 
 
 
 
548
 
549
  #### nq_pairs
550
 
551
  * Dataset: [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions) at [f9e894e](https://huggingface.co/datasets/sentence-transformers/natural-questions/tree/f9e894e1081e206e577b4eaa9ee6de2b06ae6f17)
552
+ * Size: 50,000 training samples
553
  * Columns: <code>sentence1</code> and <code>sentence2</code>
554
  * Approximate statistics based on the first 1000 samples:
555
  | | sentence1 | sentence2 |
 
565
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
566
  ```json
567
  {'guide': SentenceTransformer(
568
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
569
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
570
  (2): Normalize()
571
  ), 'temperature': 0.05}
 
574
  #### trivia_pairs
575
 
576
  * Dataset: [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa) at [a7c36e3](https://huggingface.co/datasets/sentence-transformers/trivia-qa/tree/a7c36e3c8c8c01526bc094d79bf80d4c848b0ad0)
577
+ * Size: 50,000 training samples
578
  * Columns: <code>sentence1</code> and <code>sentence2</code>
579
  * Approximate statistics based on the first 1000 samples:
580
  | | sentence1 | sentence2 |
 
590
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
591
  ```json
592
  {'guide': SentenceTransformer(
593
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
594
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
595
  (2): Normalize()
596
  ), 'temperature': 0.05}
 
599
  #### quora_pairs
600
 
601
  * Dataset: [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) at [451a485](https://huggingface.co/datasets/sentence-transformers/quora-duplicates/tree/451a4850bd141edb44ade1b5828c259abd762cdb)
602
+ * Size: 50,000 training samples
603
  * Columns: <code>sentence1</code> and <code>sentence2</code>
604
  * Approximate statistics based on the first 1000 samples:
605
  | | sentence1 | sentence2 |
 
623
  #### gooaq_pairs
624
 
625
  * Dataset: [gooaq_pairs](https://huggingface.co/datasets/sentence-transformers/gooaq) at [b089f72](https://huggingface.co/datasets/sentence-transformers/gooaq/tree/b089f728748a068b7bc5234e5bcf5b25e3c8279c)
626
+ * Size: 50,000 training samples
627
  * Columns: <code>sentence1</code> and <code>sentence2</code>
628
  * Approximate statistics based on the first 1000 samples:
629
  | | sentence1 | sentence2 |
 
639
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
640
  ```json
641
  {'guide': SentenceTransformer(
642
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
643
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
644
  (2): Normalize()
645
  ), 'temperature': 0.05}
 
666
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
667
  ```json
668
  {'guide': SentenceTransformer(
669
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
670
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
671
  (2): Normalize()
672
  ), 'temperature': 0.05}
 
691
  * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
692
  ```json
693
  {'guide': SentenceTransformer(
694
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
695
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
696
  (2): Normalize()
697
  ), 'temperature': 0.05}
 
721
  - `eval_strategy`: steps
722
  - `per_device_train_batch_size`: 28
723
  - `per_device_eval_batch_size`: 16
724
+ - `learning_rate`: 1e-05
725
  - `weight_decay`: 1e-10
726
  - `num_train_epochs`: 2
727
  - `lr_scheduler_type`: cosine
728
+ - `warmup_ratio`: 0.5
729
  - `save_safetensors`: False
730
  - `fp16`: True
731
  - `push_to_hub`: True
 
746
  - `per_gpu_eval_batch_size`: None
747
  - `gradient_accumulation_steps`: 1
748
  - `eval_accumulation_steps`: None
749
+ - `learning_rate`: 1e-05
750
  - `weight_decay`: 1e-10
751
  - `adam_beta1`: 0.9
752
  - `adam_beta2`: 0.999
 
756
  - `max_steps`: -1
757
  - `lr_scheduler_type`: cosine
758
  - `lr_scheduler_kwargs`: {}
759
+ - `warmup_ratio`: 0.5
760
  - `warmup_steps`: 0
761
  - `log_level`: passive
762
  - `log_level_replica`: warning
 
848
  </details>
849
 
850
  ### Training Logs
851
+ | Epoch | Step | Training Loss | nli-pairs loss | scitail-pairs-pos loss | qnli-contrastive loss | sts-test_spearman_cosine |
852
+ |:------:|:-----:|:-------------:|:--------------:|:----------------------:|:---------------------:|:------------------------:|
853
+ | 0 | 0 | - | 4.2656 | 3.4484 | 4.1500 | 0.2589 |
854
+ | 0.1000 | 1883 | 3.6326 | 2.6953 | 2.1726 | 2.7029 | - |
855
+ | 0.2001 | 3766 | 1.7665 | 1.2885 | 0.9638 | 1.7135 | - |
856
+ | 0.3001 | 5649 | 1.1522 | 0.9094 | 0.7571 | 0.9165 | - |
857
+ | 0.4001 | 7532 | 0.9533 | 0.7290 | 0.6498 | 0.4304 | - |
858
+ | 0.5002 | 9415 | 0.8013 | 0.6432 | 0.6007 | 0.2591 | - |
859
+ | 0.6002 | 11298 | 0.6568 | 0.5626 | 0.5481 | 0.1365 | - |
860
+ | 0.7002 | 13181 | 0.6095 | 0.5226 | 0.5109 | 0.1643 | - |
861
+ | 0.8003 | 15064 | 0.5694 | 0.4921 | 0.5194 | 0.0517 | - |
862
+ | 0.9003 | 16947 | 0.5375 | 0.5061 | 0.5643 | 0.0462 | - |
 
863
 
864
 
865
  ### Framework Versions
 
911
  }
912
  ```
913
 
914
+ #### MarginMSELoss
915
+ ```bibtex
916
+ @misc{hofstätter2021improving,
917
+ title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
918
+ author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
919
+ year={2021},
920
+ eprint={2010.02666},
921
+ archivePrefix={arXiv},
922
+ primaryClass={cs.IR}
923
+ }
924
+ ```
925
+
926
  <!--
927
  ## Glossary
928
 
last-checkpoint/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2ffdc419c0955c5b768dcbbe88faad4bf22cea53605c9fe4074ca9d083adb05c
3
  size 1130520122
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9647fed89037ba3e3282c4e91d6cc40e3b6ede7cca94a3f8c8b22b2aec5e1b70
3
  size 1130520122
last-checkpoint/pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dbe2fc34c5e1d1cc3880d11fddb3b85a8a35ab6f560ccd6126456a313344d43b
3
  size 565251810
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ea28818c6e626e44d794c42590ed98ccd08652e0026f3086a02b5ead369e633d
3
  size 565251810
last-checkpoint/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f687a4ba9f684d4913c83ab55045603e2534ca8151cfefbdd3db16a5515199a5
3
  size 14180
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:141ecdefc1c939079bd9377367b5723d56e31424215532c67fb39a68efcee019
3
  size 14180
last-checkpoint/scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:072f295cd9400d44a23f01cc82ad8c9b8b89be4ef3aba1d3b8e750e9883aec90
3
  size 1064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:295caad4fbc2e25c07e26ab55cba43a9ec3977746a577c96911a58bfcbdf8ed4
3
  size 1064
last-checkpoint/trainer_state.json CHANGED
@@ -2,328 +2,297 @@
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
  "epoch": 1.0,
5
- "eval_steps": 471,
6
- "global_step": 4710,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.1,
13
- "grad_norm": 17.846229553222656,
14
- "learning_rate": 3.004181408813123e-06,
15
- "loss": 4.4848,
16
- "step": 471
17
  },
18
  {
19
- "epoch": 0.1,
20
- "eval_nli-pairs_loss": 3.227689504623413,
21
- "eval_nli-pairs_runtime": 23.5758,
22
- "eval_nli-pairs_samples_per_second": 288.77,
23
- "eval_nli-pairs_steps_per_second": 18.069,
24
- "step": 471
25
  },
26
  {
27
- "epoch": 0.1,
28
- "eval_scitail-pairs-pos_loss": 2.469686508178711,
29
- "eval_scitail-pairs-pos_runtime": 5.4679,
30
- "eval_scitail-pairs-pos_samples_per_second": 238.485,
31
- "eval_scitail-pairs-pos_steps_per_second": 14.997,
32
- "step": 471
33
  },
34
  {
35
- "epoch": 0.1,
36
- "eval_qnli-contrastive_loss": 3.3142430782318115,
37
- "eval_qnli-contrastive_runtime": 15.7426,
38
- "eval_qnli-contrastive_samples_per_second": 347.019,
39
- "eval_qnli-contrastive_steps_per_second": 21.724,
40
- "step": 471
41
  },
42
  {
43
- "epoch": 0.2,
44
- "grad_norm": 29.59261703491211,
45
- "learning_rate": 6.027661627532969e-06,
46
- "loss": 2.6358,
47
- "step": 942
48
  },
49
  {
50
- "epoch": 0.2,
51
- "eval_nli-pairs_loss": 1.5920209884643555,
52
- "eval_nli-pairs_runtime": 23.3765,
53
- "eval_nli-pairs_samples_per_second": 291.232,
54
- "eval_nli-pairs_steps_per_second": 18.223,
55
- "step": 942
56
  },
57
  {
58
- "epoch": 0.2,
59
- "eval_scitail-pairs-pos_loss": 0.9157330989837646,
60
- "eval_scitail-pairs-pos_runtime": 5.4478,
61
- "eval_scitail-pairs-pos_samples_per_second": 239.363,
62
- "eval_scitail-pairs-pos_steps_per_second": 15.052,
63
- "step": 942
64
  },
65
  {
66
- "epoch": 0.2,
67
- "eval_qnli-contrastive_loss": 2.663238763809204,
68
- "eval_qnli-contrastive_runtime": 15.751,
69
- "eval_qnli-contrastive_samples_per_second": 346.836,
70
- "eval_qnli-contrastive_steps_per_second": 21.713,
71
- "step": 942
72
  },
73
  {
74
- "epoch": 0.3,
75
- "grad_norm": 24.539047241210938,
76
- "learning_rate": 9.057574782888389e-06,
77
- "loss": 1.7183,
78
- "step": 1413
79
  },
80
  {
81
- "epoch": 0.3,
82
- "eval_nli-pairs_loss": 1.1536647081375122,
83
- "eval_nli-pairs_runtime": 23.6115,
84
- "eval_nli-pairs_samples_per_second": 288.335,
85
- "eval_nli-pairs_steps_per_second": 18.042,
86
- "step": 1413
87
  },
88
  {
89
- "epoch": 0.3,
90
- "eval_scitail-pairs-pos_loss": 0.7445429563522339,
91
- "eval_scitail-pairs-pos_runtime": 5.3966,
92
- "eval_scitail-pairs-pos_samples_per_second": 241.635,
93
- "eval_scitail-pairs-pos_steps_per_second": 15.195,
94
- "step": 1413
95
  },
96
  {
97
- "epoch": 0.3,
98
- "eval_qnli-contrastive_loss": 2.130812406539917,
99
- "eval_qnli-contrastive_runtime": 15.7293,
100
- "eval_qnli-contrastive_samples_per_second": 347.313,
101
- "eval_qnli-contrastive_steps_per_second": 21.743,
102
- "step": 1413
103
  },
104
  {
105
- "epoch": 0.4,
106
- "grad_norm": 139.8046875,
107
- "learning_rate": 1.208748793824381e-05,
108
- "loss": 1.6114,
109
- "step": 1884
110
  },
111
  {
112
- "epoch": 0.4,
113
- "eval_nli-pairs_loss": 0.8992123007774353,
114
- "eval_nli-pairs_runtime": 23.6196,
115
- "eval_nli-pairs_samples_per_second": 288.236,
116
- "eval_nli-pairs_steps_per_second": 18.036,
117
- "step": 1884
118
  },
119
  {
120
- "epoch": 0.4,
121
- "eval_scitail-pairs-pos_loss": 0.6193641424179077,
122
- "eval_scitail-pairs-pos_runtime": 5.4024,
123
- "eval_scitail-pairs-pos_samples_per_second": 241.376,
124
- "eval_scitail-pairs-pos_steps_per_second": 15.179,
125
- "step": 1884
126
  },
127
  {
128
- "epoch": 0.4,
129
- "eval_qnli-contrastive_loss": 1.6952241659164429,
130
- "eval_qnli-contrastive_runtime": 15.7392,
131
- "eval_qnli-contrastive_samples_per_second": 347.095,
132
- "eval_qnli-contrastive_steps_per_second": 21.729,
133
- "step": 1884
134
  },
135
  {
136
- "epoch": 0.5,
137
- "grad_norm": 2.1193487644195557,
138
- "learning_rate": 1.511740109359923e-05,
139
- "loss": 1.5367,
140
- "step": 2355
141
  },
142
  {
143
- "epoch": 0.5,
144
- "eval_nli-pairs_loss": 0.8112400770187378,
145
- "eval_nli-pairs_runtime": 23.4573,
146
- "eval_nli-pairs_samples_per_second": 290.23,
147
- "eval_nli-pairs_steps_per_second": 18.161,
148
- "step": 2355
149
  },
150
  {
151
- "epoch": 0.5,
152
- "eval_scitail-pairs-pos_loss": 0.6661093831062317,
153
- "eval_scitail-pairs-pos_runtime": 5.3621,
154
- "eval_scitail-pairs-pos_samples_per_second": 243.189,
155
- "eval_scitail-pairs-pos_steps_per_second": 15.293,
156
- "step": 2355
157
  },
158
  {
159
- "epoch": 0.5,
160
- "eval_qnli-contrastive_loss": 0.8697724938392639,
161
- "eval_qnli-contrastive_runtime": 15.7092,
162
- "eval_qnli-contrastive_samples_per_second": 347.759,
163
- "eval_qnli-contrastive_steps_per_second": 21.771,
164
- "step": 2355
165
  },
166
  {
167
- "epoch": 0.6,
168
- "grad_norm": 8.693464279174805,
169
- "learning_rate": 1.814731424895465e-05,
170
- "loss": 1.1657,
171
- "step": 2826
172
  },
173
  {
174
- "epoch": 0.6,
175
- "eval_nli-pairs_loss": 0.7330080270767212,
176
- "eval_nli-pairs_runtime": 23.359,
177
- "eval_nli-pairs_samples_per_second": 291.451,
178
- "eval_nli-pairs_steps_per_second": 18.237,
179
- "step": 2826
180
  },
181
  {
182
- "epoch": 0.6,
183
- "eval_scitail-pairs-pos_loss": 0.558278501033783,
184
- "eval_scitail-pairs-pos_runtime": 5.3162,
185
- "eval_scitail-pairs-pos_samples_per_second": 245.289,
186
- "eval_scitail-pairs-pos_steps_per_second": 15.425,
187
- "step": 2826
188
  },
189
  {
190
- "epoch": 0.6,
191
- "eval_qnli-contrastive_loss": 0.8414629101753235,
192
- "eval_qnli-contrastive_runtime": 15.5773,
193
- "eval_qnli-contrastive_samples_per_second": 350.703,
194
- "eval_qnli-contrastive_steps_per_second": 21.955,
195
- "step": 2826
196
  },
197
  {
198
- "epoch": 0.7,
199
- "grad_norm": 20.00510025024414,
200
- "learning_rate": 1.995853561663268e-05,
201
- "loss": 1.2926,
202
- "step": 3297
203
  },
204
  {
205
- "epoch": 0.7,
206
- "eval_nli-pairs_loss": 0.688292384147644,
207
- "eval_nli-pairs_runtime": 23.1585,
208
- "eval_nli-pairs_samples_per_second": 293.974,
209
- "eval_nli-pairs_steps_per_second": 18.395,
210
- "step": 3297
211
  },
212
  {
213
- "epoch": 0.7,
214
- "eval_scitail-pairs-pos_loss": 0.5283708572387695,
215
- "eval_scitail-pairs-pos_runtime": 5.3322,
216
- "eval_scitail-pairs-pos_samples_per_second": 244.552,
217
- "eval_scitail-pairs-pos_steps_per_second": 15.378,
218
- "step": 3297
219
  },
220
  {
221
- "epoch": 0.7,
222
- "eval_qnli-contrastive_loss": 0.5239661335945129,
223
- "eval_qnli-contrastive_runtime": 15.5222,
224
- "eval_qnli-contrastive_samples_per_second": 351.947,
225
- "eval_qnli-contrastive_steps_per_second": 22.033,
226
- "step": 3297
227
  },
228
  {
229
- "epoch": 0.8,
230
- "grad_norm": 20.681690216064453,
231
- "learning_rate": 1.9476312452068522e-05,
232
- "loss": 1.1523,
233
- "step": 3768
234
  },
235
  {
236
- "epoch": 0.8,
237
- "eval_nli-pairs_loss": 0.6775749325752258,
238
- "eval_nli-pairs_runtime": 23.2425,
239
- "eval_nli-pairs_samples_per_second": 292.912,
240
- "eval_nli-pairs_steps_per_second": 18.328,
241
- "step": 3768
242
  },
243
  {
244
- "epoch": 0.8,
245
- "eval_scitail-pairs-pos_loss": 0.4816366732120514,
246
- "eval_scitail-pairs-pos_runtime": 5.2694,
247
- "eval_scitail-pairs-pos_samples_per_second": 247.467,
248
- "eval_scitail-pairs-pos_steps_per_second": 15.562,
249
- "step": 3768
250
  },
251
  {
252
- "epoch": 0.8,
253
- "eval_qnli-contrastive_loss": 0.4342482388019562,
254
- "eval_qnli-contrastive_runtime": 15.5335,
255
- "eval_qnli-contrastive_samples_per_second": 351.691,
256
- "eval_qnli-contrastive_steps_per_second": 22.017,
257
- "step": 3768
258
  },
259
  {
260
- "epoch": 0.9,
261
- "grad_norm": 12.640650749206543,
262
- "learning_rate": 1.8475083492522773e-05,
263
- "loss": 1.0387,
264
- "step": 4239
265
  },
266
  {
267
- "epoch": 0.9,
268
- "eval_nli-pairs_loss": 0.6213383674621582,
269
- "eval_nli-pairs_runtime": 23.1579,
270
- "eval_nli-pairs_samples_per_second": 293.981,
271
- "eval_nli-pairs_steps_per_second": 18.395,
272
- "step": 4239
273
  },
274
  {
275
- "epoch": 0.9,
276
- "eval_scitail-pairs-pos_loss": 0.4603377878665924,
277
- "eval_scitail-pairs-pos_runtime": 5.3009,
278
- "eval_scitail-pairs-pos_samples_per_second": 245.997,
279
- "eval_scitail-pairs-pos_steps_per_second": 15.469,
280
- "step": 4239
281
  },
282
  {
283
- "epoch": 0.9,
284
- "eval_qnli-contrastive_loss": 0.3022189736366272,
285
- "eval_qnli-contrastive_runtime": 15.5459,
286
- "eval_qnli-contrastive_samples_per_second": 351.411,
287
- "eval_qnli-contrastive_steps_per_second": 21.999,
288
- "step": 4239
289
- },
290
- {
291
- "epoch": 1.0,
292
- "grad_norm": 20.227073669433594,
293
- "learning_rate": 1.701008869684049e-05,
294
- "loss": 1.0356,
295
- "step": 4710
296
- },
297
- {
298
- "epoch": 1.0,
299
- "eval_nli-pairs_loss": 0.6488831043243408,
300
- "eval_nli-pairs_runtime": 23.1759,
301
- "eval_nli-pairs_samples_per_second": 293.753,
302
- "eval_nli-pairs_steps_per_second": 18.381,
303
- "step": 4710
304
- },
305
- {
306
- "epoch": 1.0,
307
- "eval_scitail-pairs-pos_loss": 0.5449082255363464,
308
- "eval_scitail-pairs-pos_runtime": 5.3602,
309
- "eval_scitail-pairs-pos_samples_per_second": 243.276,
310
- "eval_scitail-pairs-pos_steps_per_second": 15.298,
311
- "step": 4710
312
- },
313
- {
314
- "epoch": 1.0,
315
- "eval_qnli-contrastive_loss": 0.1294127106666565,
316
- "eval_qnli-contrastive_runtime": 15.5044,
317
- "eval_qnli-contrastive_samples_per_second": 352.352,
318
- "eval_qnli-contrastive_steps_per_second": 22.058,
319
- "step": 4710
320
  }
321
  ],
322
- "logging_steps": 471,
323
- "max_steps": 9420,
324
  "num_input_tokens_seen": 0,
325
  "num_train_epochs": 2,
326
- "save_steps": 4710,
327
  "stateful_callbacks": {
328
  "TrainerControl": {
329
  "args": {
 
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
  "epoch": 1.0,
5
+ "eval_steps": 1883,
6
+ "global_step": 18824,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.10003187420314492,
13
+ "grad_norm": 39.029380798339844,
14
+ "learning_rate": 9.976625584360391e-07,
15
+ "loss": 3.6326,
16
+ "step": 1883
17
  },
18
  {
19
+ "epoch": 0.10003187420314492,
20
+ "eval_nli-pairs_loss": 2.6952593326568604,
21
+ "eval_nli-pairs_runtime": 25.731,
22
+ "eval_nli-pairs_samples_per_second": 264.584,
23
+ "eval_nli-pairs_steps_per_second": 16.556,
24
+ "step": 1883
25
  },
26
  {
27
+ "epoch": 0.10003187420314492,
28
+ "eval_scitail-pairs-pos_loss": 2.172569990158081,
29
+ "eval_scitail-pairs-pos_runtime": 6.2772,
30
+ "eval_scitail-pairs-pos_samples_per_second": 207.736,
31
+ "eval_scitail-pairs-pos_steps_per_second": 13.063,
32
+ "step": 1883
33
  },
34
  {
35
+ "epoch": 0.10003187420314492,
36
+ "eval_qnli-contrastive_loss": 2.702913999557495,
37
+ "eval_qnli-contrastive_runtime": 16.475,
38
+ "eval_qnli-contrastive_samples_per_second": 331.593,
39
+ "eval_qnli-contrastive_steps_per_second": 20.759,
40
+ "step": 1883
41
  },
42
  {
43
+ "epoch": 0.20006374840628985,
44
+ "grad_norm": 25.459535598754883,
45
+ "learning_rate": 1.9974500637484067e-06,
46
+ "loss": 1.7665,
47
+ "step": 3766
48
  },
49
  {
50
+ "epoch": 0.20006374840628985,
51
+ "eval_nli-pairs_loss": 1.2885302305221558,
52
+ "eval_nli-pairs_runtime": 25.4564,
53
+ "eval_nli-pairs_samples_per_second": 267.438,
54
+ "eval_nli-pairs_steps_per_second": 16.734,
55
+ "step": 3766
56
  },
57
  {
58
+ "epoch": 0.20006374840628985,
59
+ "eval_scitail-pairs-pos_loss": 0.9637606143951416,
60
+ "eval_scitail-pairs-pos_runtime": 6.1565,
61
+ "eval_scitail-pairs-pos_samples_per_second": 211.809,
62
+ "eval_scitail-pairs-pos_steps_per_second": 13.319,
63
+ "step": 3766
64
  },
65
  {
66
+ "epoch": 0.20006374840628985,
67
+ "eval_qnli-contrastive_loss": 1.713547945022583,
68
+ "eval_qnli-contrastive_runtime": 16.4307,
69
+ "eval_qnli-contrastive_samples_per_second": 332.487,
70
+ "eval_qnli-contrastive_steps_per_second": 20.815,
71
+ "step": 3766
72
  },
73
  {
74
+ "epoch": 0.3000956226094348,
75
+ "grad_norm": 0.8201059103012085,
76
+ "learning_rate": 2.9977688057798558e-06,
77
+ "loss": 1.1522,
78
+ "step": 5649
79
  },
80
  {
81
+ "epoch": 0.3000956226094348,
82
+ "eval_nli-pairs_loss": 0.9093547463417053,
83
+ "eval_nli-pairs_runtime": 25.1271,
84
+ "eval_nli-pairs_samples_per_second": 270.943,
85
+ "eval_nli-pairs_steps_per_second": 16.954,
86
+ "step": 5649
87
  },
88
  {
89
+ "epoch": 0.3000956226094348,
90
+ "eval_scitail-pairs-pos_loss": 0.7571232914924622,
91
+ "eval_scitail-pairs-pos_runtime": 5.9021,
92
+ "eval_scitail-pairs-pos_samples_per_second": 220.937,
93
+ "eval_scitail-pairs-pos_steps_per_second": 13.893,
94
+ "step": 5649
95
  },
96
  {
97
+ "epoch": 0.3000956226094348,
98
+ "eval_qnli-contrastive_loss": 0.91651451587677,
99
+ "eval_qnli-contrastive_runtime": 16.2309,
100
+ "eval_qnli-contrastive_samples_per_second": 336.579,
101
+ "eval_qnli-contrastive_steps_per_second": 21.071,
102
+ "step": 5649
103
  },
104
  {
105
+ "epoch": 0.4001274968125797,
106
+ "grad_norm": 12.970890045166016,
107
+ "learning_rate": 3.9975563110922225e-06,
108
+ "loss": 0.9533,
109
+ "step": 7532
110
  },
111
  {
112
+ "epoch": 0.4001274968125797,
113
+ "eval_nli-pairs_loss": 0.7290090322494507,
114
+ "eval_nli-pairs_runtime": 25.3154,
115
+ "eval_nli-pairs_samples_per_second": 268.928,
116
+ "eval_nli-pairs_steps_per_second": 16.828,
117
+ "step": 7532
118
  },
119
  {
120
+ "epoch": 0.4001274968125797,
121
+ "eval_scitail-pairs-pos_loss": 0.6498324275016785,
122
+ "eval_scitail-pairs-pos_runtime": 6.0764,
123
+ "eval_scitail-pairs-pos_samples_per_second": 214.6,
124
+ "eval_scitail-pairs-pos_steps_per_second": 13.495,
125
+ "step": 7532
126
  },
127
  {
128
+ "epoch": 0.4001274968125797,
129
+ "eval_qnli-contrastive_loss": 0.4303818643093109,
130
+ "eval_qnli-contrastive_runtime": 16.4463,
131
+ "eval_qnli-contrastive_samples_per_second": 332.172,
132
+ "eval_qnli-contrastive_steps_per_second": 20.795,
133
+ "step": 7532
134
  },
135
  {
136
+ "epoch": 0.5001593710157246,
137
+ "grad_norm": 10.865135192871094,
138
+ "learning_rate": 4.9973438164045905e-06,
139
+ "loss": 0.8013,
140
+ "step": 9415
141
  },
142
  {
143
+ "epoch": 0.5001593710157246,
144
+ "eval_nli-pairs_loss": 0.6431913375854492,
145
+ "eval_nli-pairs_runtime": 25.4337,
146
+ "eval_nli-pairs_samples_per_second": 267.676,
147
+ "eval_nli-pairs_steps_per_second": 16.749,
148
+ "step": 9415
149
  },
150
  {
151
+ "epoch": 0.5001593710157246,
152
+ "eval_scitail-pairs-pos_loss": 0.6006649732589722,
153
+ "eval_scitail-pairs-pos_runtime": 6.199,
154
+ "eval_scitail-pairs-pos_samples_per_second": 210.355,
155
+ "eval_scitail-pairs-pos_steps_per_second": 13.228,
156
+ "step": 9415
157
  },
158
  {
159
+ "epoch": 0.5001593710157246,
160
+ "eval_qnli-contrastive_loss": 0.25907495617866516,
161
+ "eval_qnli-contrastive_runtime": 16.4896,
162
+ "eval_qnli-contrastive_samples_per_second": 331.299,
163
+ "eval_qnli-contrastive_steps_per_second": 20.74,
164
+ "step": 9415
165
  },
166
  {
167
+ "epoch": 0.6001912452188696,
168
+ "grad_norm": 2.3549954891204834,
169
+ "learning_rate": 5.997662558436039e-06,
170
+ "loss": 0.6568,
171
+ "step": 11298
172
  },
173
  {
174
+ "epoch": 0.6001912452188696,
175
+ "eval_nli-pairs_loss": 0.5626155734062195,
176
+ "eval_nli-pairs_runtime": 25.1226,
177
+ "eval_nli-pairs_samples_per_second": 270.991,
178
+ "eval_nli-pairs_steps_per_second": 16.957,
179
+ "step": 11298
180
  },
181
  {
182
+ "epoch": 0.6001912452188696,
183
+ "eval_scitail-pairs-pos_loss": 0.5481033325195312,
184
+ "eval_scitail-pairs-pos_runtime": 6.0513,
185
+ "eval_scitail-pairs-pos_samples_per_second": 215.492,
186
+ "eval_scitail-pairs-pos_steps_per_second": 13.551,
187
+ "step": 11298
188
  },
189
  {
190
+ "epoch": 0.6001912452188696,
191
+ "eval_qnli-contrastive_loss": 0.13647136092185974,
192
+ "eval_qnli-contrastive_runtime": 16.3856,
193
+ "eval_qnli-contrastive_samples_per_second": 333.402,
194
+ "eval_qnli-contrastive_steps_per_second": 20.872,
195
+ "step": 11298
196
  },
197
  {
198
+ "epoch": 0.7002231194220144,
199
+ "grad_norm": 10.994942665100098,
200
+ "learning_rate": 6.997450063748406e-06,
201
+ "loss": 0.6095,
202
+ "step": 13181
203
  },
204
  {
205
+ "epoch": 0.7002231194220144,
206
+ "eval_nli-pairs_loss": 0.5226004719734192,
207
+ "eval_nli-pairs_runtime": 25.203,
208
+ "eval_nli-pairs_samples_per_second": 270.127,
209
+ "eval_nli-pairs_steps_per_second": 16.903,
210
+ "step": 13181
211
  },
212
  {
213
+ "epoch": 0.7002231194220144,
214
+ "eval_scitail-pairs-pos_loss": 0.5108869075775146,
215
+ "eval_scitail-pairs-pos_runtime": 6.1126,
216
+ "eval_scitail-pairs-pos_samples_per_second": 213.331,
217
+ "eval_scitail-pairs-pos_steps_per_second": 13.415,
218
+ "step": 13181
219
  },
220
  {
221
+ "epoch": 0.7002231194220144,
222
+ "eval_qnli-contrastive_loss": 0.16431590914726257,
223
+ "eval_qnli-contrastive_runtime": 16.4372,
224
+ "eval_qnli-contrastive_samples_per_second": 332.355,
225
+ "eval_qnli-contrastive_steps_per_second": 20.806,
226
+ "step": 13181
227
  },
228
  {
229
+ "epoch": 0.8002549936251594,
230
+ "grad_norm": 8.826902389526367,
231
+ "learning_rate": 7.997768805779857e-06,
232
+ "loss": 0.5694,
233
+ "step": 15064
234
  },
235
  {
236
+ "epoch": 0.8002549936251594,
237
+ "eval_nli-pairs_loss": 0.49213743209838867,
238
+ "eval_nli-pairs_runtime": 25.0892,
239
+ "eval_nli-pairs_samples_per_second": 271.352,
240
+ "eval_nli-pairs_steps_per_second": 16.979,
241
+ "step": 15064
242
  },
243
  {
244
+ "epoch": 0.8002549936251594,
245
+ "eval_scitail-pairs-pos_loss": 0.5194270610809326,
246
+ "eval_scitail-pairs-pos_runtime": 6.261,
247
+ "eval_scitail-pairs-pos_samples_per_second": 208.273,
248
+ "eval_scitail-pairs-pos_steps_per_second": 13.097,
249
+ "step": 15064
250
  },
251
  {
252
+ "epoch": 0.8002549936251594,
253
+ "eval_qnli-contrastive_loss": 0.05173656344413757,
254
+ "eval_qnli-contrastive_runtime": 16.3578,
255
+ "eval_qnli-contrastive_samples_per_second": 333.97,
256
+ "eval_qnli-contrastive_steps_per_second": 20.908,
257
+ "step": 15064
258
  },
259
  {
260
+ "epoch": 0.9002868678283042,
261
+ "grad_norm": 0.4369502067565918,
262
+ "learning_rate": 8.997556311092223e-06,
263
+ "loss": 0.5375,
264
+ "step": 16947
265
  },
266
  {
267
+ "epoch": 0.9002868678283042,
268
+ "eval_nli-pairs_loss": 0.5060996413230896,
269
+ "eval_nli-pairs_runtime": 25.3561,
270
+ "eval_nli-pairs_samples_per_second": 268.496,
271
+ "eval_nli-pairs_steps_per_second": 16.801,
272
+ "step": 16947
273
  },
274
  {
275
+ "epoch": 0.9002868678283042,
276
+ "eval_scitail-pairs-pos_loss": 0.5642966628074646,
277
+ "eval_scitail-pairs-pos_runtime": 6.1557,
278
+ "eval_scitail-pairs-pos_samples_per_second": 211.837,
279
+ "eval_scitail-pairs-pos_steps_per_second": 13.321,
280
+ "step": 16947
281
  },
282
  {
283
+ "epoch": 0.9002868678283042,
284
+ "eval_qnli-contrastive_loss": 0.046243228018283844,
285
+ "eval_qnli-contrastive_runtime": 16.4399,
286
+ "eval_qnli-contrastive_samples_per_second": 332.302,
287
+ "eval_qnli-contrastive_steps_per_second": 20.803,
288
+ "step": 16947
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
289
  }
290
  ],
291
+ "logging_steps": 1883,
292
+ "max_steps": 37648,
293
  "num_input_tokens_seen": 0,
294
  "num_train_epochs": 2,
295
+ "save_steps": 18824,
296
  "stateful_callbacks": {
297
  "TrainerControl": {
298
  "args": {
last-checkpoint/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f77cf769359ac9f8d174c1dacb0dddd22a816e2ab9373ddf22d72bd785967730
3
  size 5624
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59541de2a5be81ee914802456d2cdf4f51877f8f7384f609c9fad68c9ba147bc
3
  size 5624