tomaarsen HF staff commited on
Commit
d1c1ceb
1 Parent(s): f9b57f5

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,744 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: []
3
+ library_name: sentence-transformers
4
+ tags:
5
+ - sentence-transformers
6
+ - sentence-similarity
7
+ - feature-extraction
8
+ - generated
9
+ base_model: sentence-transformers/stsb-distilbert-base
10
+ metrics:
11
+ - cosine_accuracy
12
+ - cosine_accuracy_threshold
13
+ - cosine_f1
14
+ - cosine_f1_threshold
15
+ - cosine_precision
16
+ - cosine_recall
17
+ - cosine_ap
18
+ - manhattan_accuracy
19
+ - manhattan_accuracy_threshold
20
+ - manhattan_f1
21
+ - manhattan_f1_threshold
22
+ - manhattan_precision
23
+ - manhattan_recall
24
+ - manhattan_ap
25
+ - euclidean_accuracy
26
+ - euclidean_accuracy_threshold
27
+ - euclidean_f1
28
+ - euclidean_f1_threshold
29
+ - euclidean_precision
30
+ - euclidean_recall
31
+ - euclidean_ap
32
+ - dot_accuracy
33
+ - dot_accuracy_threshold
34
+ - dot_f1
35
+ - dot_f1_threshold
36
+ - dot_precision
37
+ - dot_recall
38
+ - dot_ap
39
+ - max_accuracy
40
+ - max_accuracy_threshold
41
+ - max_f1
42
+ - max_f1_threshold
43
+ - max_precision
44
+ - max_recall
45
+ - max_ap
46
+ - average_precision
47
+ - f1
48
+ - precision
49
+ - recall
50
+ - threshold
51
+ - cosine_accuracy@1
52
+ - cosine_accuracy@3
53
+ - cosine_accuracy@5
54
+ - cosine_accuracy@10
55
+ - cosine_precision@1
56
+ - cosine_precision@3
57
+ - cosine_precision@5
58
+ - cosine_precision@10
59
+ - cosine_recall@1
60
+ - cosine_recall@3
61
+ - cosine_recall@5
62
+ - cosine_recall@10
63
+ - cosine_ndcg@10
64
+ - cosine_mrr@10
65
+ - cosine_map@100
66
+ - dot_accuracy@1
67
+ - dot_accuracy@3
68
+ - dot_accuracy@5
69
+ - dot_accuracy@10
70
+ - dot_precision@1
71
+ - dot_precision@3
72
+ - dot_precision@5
73
+ - dot_precision@10
74
+ - dot_recall@1
75
+ - dot_recall@3
76
+ - dot_recall@5
77
+ - dot_recall@10
78
+ - dot_ndcg@10
79
+ - dot_mrr@10
80
+ - dot_map@100
81
+ widget:
82
+ - source_sentence: How porn is made?
83
+ sentences:
84
+ - How is porn made?
85
+ - How do you study before a test?
86
+ - What is the best book for afcat?
87
+ - source_sentence: Is WW3 inevitable?
88
+ sentences:
89
+ - How close to WW3 are we?
90
+ - Is it ok not to know everything?
91
+ - How can I get good marks on my exam?
92
+ - source_sentence: How do stop smoking?
93
+ sentences:
94
+ - How did you quit/stop smoking?
95
+ - How can I gain weight naturally?
96
+ - What movie is the best movie of 2016?
97
+ - source_sentence: What is astrology?
98
+ sentences:
99
+ - What really is astrology?
100
+ - How do I control blood pressure?
101
+ - How should I reduce weight easily?
102
+ - source_sentence: What is SMS API?
103
+ sentences:
104
+ - What is an SMS API?
105
+ - How will Sound travel in SPACE?
106
+ - Do we live inside a black hole?
107
+ pipeline_tag: sentence-similarity
108
+ model-index:
109
+ - name: SentenceTransformer based on sentence-transformers/stsb-distilbert-base
110
+ results:
111
+ - task:
112
+ type: binary-classification
113
+ name: Binary Classification
114
+ dataset:
115
+ name: Unknown
116
+ type: unknown
117
+ metrics:
118
+ - type: cosine_accuracy
119
+ value: 0.770712179816613
120
+ name: Cosine Accuracy
121
+ - type: cosine_accuracy_threshold
122
+ value: 0.8169694542884827
123
+ name: Cosine Accuracy Threshold
124
+ - type: cosine_f1
125
+ value: 0.7086398522340053
126
+ name: Cosine F1
127
+ - type: cosine_f1_threshold
128
+ value: 0.7420324087142944
129
+ name: Cosine F1 Threshold
130
+ - type: cosine_precision
131
+ value: 0.6032968224704479
132
+ name: Cosine Precision
133
+ - type: cosine_recall
134
+ value: 0.8585539007639479
135
+ name: Cosine Recall
136
+ - type: cosine_ap
137
+ value: 0.7191176594498068
138
+ name: Cosine Ap
139
+ - type: manhattan_accuracy
140
+ value: 0.7729301344296882
141
+ name: Manhattan Accuracy
142
+ - type: manhattan_accuracy_threshold
143
+ value: 181.4663848876953
144
+ name: Manhattan Accuracy Threshold
145
+ - type: manhattan_f1
146
+ value: 0.7082838527457715
147
+ name: Manhattan F1
148
+ - type: manhattan_f1_threshold
149
+ value: 222.911865234375
150
+ name: Manhattan F1 Threshold
151
+ - type: manhattan_precision
152
+ value: 0.6063303659742829
153
+ name: Manhattan Precision
154
+ - type: manhattan_recall
155
+ value: 0.8514545875453353
156
+ name: Manhattan Recall
157
+ - type: manhattan_ap
158
+ value: 0.7188011305084623
159
+ name: Manhattan Ap
160
+ - type: euclidean_accuracy
161
+ value: 0.7736333883313948
162
+ name: Euclidean Accuracy
163
+ - type: euclidean_accuracy_threshold
164
+ value: 8.356552124023438
165
+ name: Euclidean Accuracy Threshold
166
+ - type: euclidean_f1
167
+ value: 0.7088200276731988
168
+ name: Euclidean F1
169
+ - type: euclidean_f1_threshold
170
+ value: 10.092880249023438
171
+ name: Euclidean F1 Threshold
172
+ - type: euclidean_precision
173
+ value: 0.6079037421348935
174
+ name: Euclidean Precision
175
+ - type: euclidean_recall
176
+ value: 0.8499112585847673
177
+ name: Euclidean Recall
178
+ - type: euclidean_ap
179
+ value: 0.719131590718056
180
+ name: Euclidean Ap
181
+ - type: dot_accuracy
182
+ value: 0.7441508209136891
183
+ name: Dot Accuracy
184
+ - type: dot_accuracy_threshold
185
+ value: 168.56625366210938
186
+ name: Dot Accuracy Threshold
187
+ - type: dot_f1
188
+ value: 0.6831510249103777
189
+ name: Dot F1
190
+ - type: dot_f1_threshold
191
+ value: 142.45849609375
192
+ name: Dot F1 Threshold
193
+ - type: dot_precision
194
+ value: 0.5665209879052749
195
+ name: Dot Precision
196
+ - type: dot_recall
197
+ value: 0.8602515626205726
198
+ name: Dot Recall
199
+ - type: dot_ap
200
+ value: 0.6693622133717865
201
+ name: Dot Ap
202
+ - type: max_accuracy
203
+ value: 0.7736333883313948
204
+ name: Max Accuracy
205
+ - type: max_accuracy_threshold
206
+ value: 181.4663848876953
207
+ name: Max Accuracy Threshold
208
+ - type: max_f1
209
+ value: 0.7088200276731988
210
+ name: Max F1
211
+ - type: max_f1_threshold
212
+ value: 222.911865234375
213
+ name: Max F1 Threshold
214
+ - type: max_precision
215
+ value: 0.6079037421348935
216
+ name: Max Precision
217
+ - type: max_recall
218
+ value: 0.8602515626205726
219
+ name: Max Recall
220
+ - type: max_ap
221
+ value: 0.719131590718056
222
+ name: Max Ap
223
+ - task:
224
+ type: paraphrase-mining
225
+ name: Paraphrase Mining
226
+ dataset:
227
+ name: dev
228
+ type: dev
229
+ metrics:
230
+ - type: average_precision
231
+ value: 0.47803306271270435
232
+ name: Average Precision
233
+ - type: f1
234
+ value: 0.5119182746878547
235
+ name: F1
236
+ - type: precision
237
+ value: 0.4683281412253375
238
+ name: Precision
239
+ - type: recall
240
+ value: 0.5644555694618273
241
+ name: Recall
242
+ - type: threshold
243
+ value: 0.8193174600601196
244
+ name: Threshold
245
+ - task:
246
+ type: information-retrieval
247
+ name: Information Retrieval
248
+ dataset:
249
+ name: Unknown
250
+ type: unknown
251
+ metrics:
252
+ - type: cosine_accuracy@1
253
+ value: 0.9654
254
+ name: Cosine Accuracy@1
255
+ - type: cosine_accuracy@3
256
+ value: 0.9904
257
+ name: Cosine Accuracy@3
258
+ - type: cosine_accuracy@5
259
+ value: 0.9948
260
+ name: Cosine Accuracy@5
261
+ - type: cosine_accuracy@10
262
+ value: 0.9974
263
+ name: Cosine Accuracy@10
264
+ - type: cosine_precision@1
265
+ value: 0.9654
266
+ name: Cosine Precision@1
267
+ - type: cosine_precision@3
268
+ value: 0.43553333333333333
269
+ name: Cosine Precision@3
270
+ - type: cosine_precision@5
271
+ value: 0.28064
272
+ name: Cosine Precision@5
273
+ - type: cosine_precision@10
274
+ value: 0.14934
275
+ name: Cosine Precision@10
276
+ - type: cosine_recall@1
277
+ value: 0.8251379240296788
278
+ name: Cosine Recall@1
279
+ - type: cosine_recall@3
280
+ value: 0.9549051140803786
281
+ name: Cosine Recall@3
282
+ - type: cosine_recall@5
283
+ value: 0.9757885342898082
284
+ name: Cosine Recall@5
285
+ - type: cosine_recall@10
286
+ value: 0.9898260744103871
287
+ name: Cosine Recall@10
288
+ - type: cosine_ndcg@10
289
+ value: 0.9786162291363164
290
+ name: Cosine Ndcg@10
291
+ - type: cosine_mrr@10
292
+ value: 0.9785615873015873
293
+ name: Cosine Mrr@10
294
+ - type: cosine_map@100
295
+ value: 0.9713888565523412
296
+ name: Cosine Map@100
297
+ - type: dot_accuracy@1
298
+ value: 0.9512
299
+ name: Dot Accuracy@1
300
+ - type: dot_accuracy@3
301
+ value: 0.985
302
+ name: Dot Accuracy@3
303
+ - type: dot_accuracy@5
304
+ value: 0.9914
305
+ name: Dot Accuracy@5
306
+ - type: dot_accuracy@10
307
+ value: 0.9964
308
+ name: Dot Accuracy@10
309
+ - type: dot_precision@1
310
+ value: 0.9512
311
+ name: Dot Precision@1
312
+ - type: dot_precision@3
313
+ value: 0.4303333333333333
314
+ name: Dot Precision@3
315
+ - type: dot_precision@5
316
+ value: 0.2788
317
+ name: Dot Precision@5
318
+ - type: dot_precision@10
319
+ value: 0.14896
320
+ name: Dot Precision@10
321
+ - type: dot_recall@1
322
+ value: 0.8119095906963455
323
+ name: Dot Recall@1
324
+ - type: dot_recall@3
325
+ value: 0.9459636855089498
326
+ name: Dot Recall@3
327
+ - type: dot_recall@5
328
+ value: 0.9708092557905298
329
+ name: Dot Recall@5
330
+ - type: dot_recall@10
331
+ value: 0.9883617291912786
332
+ name: Dot Recall@10
333
+ - type: dot_ndcg@10
334
+ value: 0.9702609044345125
335
+ name: Dot Ndcg@10
336
+ - type: dot_mrr@10
337
+ value: 0.9693138888888887
338
+ name: Dot Mrr@10
339
+ - type: dot_map@100
340
+ value: 0.9599586870108953
341
+ name: Dot Map@100
342
+ ---
343
+
344
+ # SentenceTransformer based on sentence-transformers/stsb-distilbert-base
345
+
346
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/stsb-distilbert-base](https://huggingface.co/sentence-transformers/stsb-distilbert-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
347
+
348
+ ## Model Details
349
+
350
+ ### Model Description
351
+ - **Model Type:** Sentence Transformer
352
+ - **Base model:** [sentence-transformers/stsb-distilbert-base](https://huggingface.co/sentence-transformers/stsb-distilbert-base)
353
+ - **Maximum Sequence Length:** 128 tokens
354
+ - **Output Dimensionality:** 768 tokens
355
+ <!-- - **Training Dataset:** Unknown -->
356
+ <!-- - **Language:** Unknown -->
357
+ <!-- - **License:** Unknown -->
358
+
359
+ ### Model Sources
360
+
361
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
362
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
363
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
364
+
365
+ ### Full Model Architecture
366
+
367
+ ```
368
+ SentenceTransformer(
369
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel
370
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
371
+ )
372
+ ```
373
+
374
+ ## Usage
375
+
376
+ ### Direct Usage (Sentence Transformers)
377
+
378
+ First install the Sentence Transformers library:
379
+
380
+ ```bash
381
+ pip install -U sentence-transformers
382
+ ```
383
+
384
+ Then you can load this model and run inference.
385
+ ```python
386
+ from sentence_transformers import SentenceTransformer
387
+
388
+ # Download from the 🤗 Hub
389
+ model = SentenceTransformer("tomaarsen/stsb-distilbert-base-quora-duplicate-questions")
390
+ # Run inference
391
+ sentences = [
392
+ "What is a fetish?",
393
+ "What's a fetish?",
394
+ "Is it good to read sex stories?",
395
+ ]
396
+ embeddings = model.encode(sentences)
397
+ print(embeddings.shape)
398
+ # [3, 768]
399
+ ```
400
+
401
+ <!--
402
+ ### Direct Usage (Transformers)
403
+
404
+ <details><summary>Click to see the direct usage in Transformers</summary>
405
+
406
+ </details>
407
+ -->
408
+
409
+ <!--
410
+ ### Downstream Usage (Sentence Transformers)
411
+
412
+ You can finetune this model on your own dataset.
413
+
414
+ <details><summary>Click to expand</summary>
415
+
416
+ </details>
417
+ -->
418
+
419
+ <!--
420
+ ### Out-of-Scope Use
421
+
422
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
423
+ -->
424
+
425
+ ## Evaluation
426
+
427
+ ### Metrics
428
+
429
+ #### Binary Classification
430
+
431
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
432
+
433
+ | Metric | Value |
434
+ |:-----------------------------|:-----------|
435
+ | **cosine_accuracy** | **0.7707** |
436
+ | cosine_accuracy_threshold | 0.817 |
437
+ | cosine_f1 | 0.7086 |
438
+ | cosine_f1_threshold | 0.742 |
439
+ | cosine_precision | 0.6033 |
440
+ | cosine_recall | 0.8586 |
441
+ | cosine_ap | 0.7191 |
442
+ | manhattan_accuracy | 0.7729 |
443
+ | manhattan_accuracy_threshold | 181.4664 |
444
+ | manhattan_f1 | 0.7083 |
445
+ | manhattan_f1_threshold | 222.9119 |
446
+ | manhattan_precision | 0.6063 |
447
+ | manhattan_recall | 0.8515 |
448
+ | manhattan_ap | 0.7188 |
449
+ | euclidean_accuracy | 0.7736 |
450
+ | euclidean_accuracy_threshold | 8.3566 |
451
+ | euclidean_f1 | 0.7088 |
452
+ | euclidean_f1_threshold | 10.0929 |
453
+ | euclidean_precision | 0.6079 |
454
+ | euclidean_recall | 0.8499 |
455
+ | euclidean_ap | 0.7191 |
456
+ | dot_accuracy | 0.7442 |
457
+ | dot_accuracy_threshold | 168.5663 |
458
+ | dot_f1 | 0.6832 |
459
+ | dot_f1_threshold | 142.4585 |
460
+ | dot_precision | 0.5665 |
461
+ | dot_recall | 0.8603 |
462
+ | dot_ap | 0.6694 |
463
+ | max_accuracy | 0.7736 |
464
+ | max_accuracy_threshold | 181.4664 |
465
+ | max_f1 | 0.7088 |
466
+ | max_f1_threshold | 222.9119 |
467
+ | max_precision | 0.6079 |
468
+ | max_recall | 0.8603 |
469
+ | max_ap | 0.7191 |
470
+
471
+ #### Paraphrase Mining
472
+ * Dataset: `dev`
473
+ * Evaluated with [<code>ParaphraseMiningEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.ParaphraseMiningEvaluator)
474
+
475
+ | Metric | Value |
476
+ |:----------------------|:----------|
477
+ | **average_precision** | **0.478** |
478
+ | f1 | 0.5119 |
479
+ | precision | 0.4683 |
480
+ | recall | 0.5645 |
481
+ | threshold | 0.8193 |
482
+
483
+ #### Information Retrieval
484
+
485
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
486
+
487
+ | Metric | Value |
488
+ |:--------------------|:-----------|
489
+ | cosine_accuracy@1 | 0.9654 |
490
+ | cosine_accuracy@3 | 0.9904 |
491
+ | cosine_accuracy@5 | 0.9948 |
492
+ | cosine_accuracy@10 | 0.9974 |
493
+ | cosine_precision@1 | 0.9654 |
494
+ | cosine_precision@3 | 0.4355 |
495
+ | cosine_precision@5 | 0.2806 |
496
+ | cosine_precision@10 | 0.1493 |
497
+ | cosine_recall@1 | 0.8251 |
498
+ | cosine_recall@3 | 0.9549 |
499
+ | cosine_recall@5 | 0.9758 |
500
+ | cosine_recall@10 | 0.9898 |
501
+ | cosine_ndcg@10 | 0.9786 |
502
+ | cosine_mrr@10 | 0.9786 |
503
+ | **cosine_map@100** | **0.9714** |
504
+ | dot_accuracy@1 | 0.9512 |
505
+ | dot_accuracy@3 | 0.985 |
506
+ | dot_accuracy@5 | 0.9914 |
507
+ | dot_accuracy@10 | 0.9964 |
508
+ | dot_precision@1 | 0.9512 |
509
+ | dot_precision@3 | 0.4303 |
510
+ | dot_precision@5 | 0.2788 |
511
+ | dot_precision@10 | 0.149 |
512
+ | dot_recall@1 | 0.8119 |
513
+ | dot_recall@3 | 0.946 |
514
+ | dot_recall@5 | 0.9708 |
515
+ | dot_recall@10 | 0.9884 |
516
+ | dot_ndcg@10 | 0.9703 |
517
+ | dot_mrr@10 | 0.9693 |
518
+ | dot_map@100 | 0.96 |
519
+
520
+ <!--
521
+ ## Bias, Risks and Limitations
522
+
523
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
524
+ -->
525
+
526
+ <!--
527
+ ### Recommendations
528
+
529
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
530
+ -->
531
+
532
+ ## Training Details
533
+
534
+ ### Training Dataset
535
+
536
+ #### Unnamed Dataset
537
+
538
+
539
+ * Size: 207,326 training samples
540
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
541
+ * Approximate statistics based on the first 1000 samples:
542
+ | | sentence_0 | sentence_1 | label |
543
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------|
544
+ | type | string | string | int |
545
+ | details | <ul><li>min: 6 tokens</li><li>mean: 13.75 tokens</li><li>max: 42 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 13.74 tokens</li><li>max: 44 tokens</li></ul> | <ul><li>1: ~100.00%</li></ul> |
546
+ * Samples:
547
+ | sentence_0 | sentence_1 | label |
548
+ |:------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------|:---------------|
549
+ | <code>How do I improve writing skill by myself?</code> | <code>How can I improve writing skills?</code> | <code>1</code> |
550
+ | <code>Is it best to switch to Node.js from PHP?</code> | <code>Should I switch to Node.js or continue using PHP?</code> | <code>1</code> |
551
+ | <code>What do Hillary Clinton's supporters say when confronted with all her lies and scandals?</code> | <code>What do Clinton supporters say when confronted with her scandals such as the emails and 'Clinton Cash'?</code> | <code>1</code> |
552
+ * Loss: [<code>sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/losses.html#multiplenegativesrankingloss) with these parameters:
553
+ ```json
554
+ {
555
+ "scale": 20.0,
556
+ "similarity_fct": "cos_sim"
557
+ }
558
+ ```
559
+
560
+ ### Training Hyperparameters
561
+ #### Non-Default Hyperparameters
562
+
563
+ - per_device_train_batch_size: 64
564
+ - per_device_eval_batch_size: 64
565
+ - num_train_epochs: 1
566
+ - round_robin_sampler: True
567
+
568
+ #### All Hyperparameters
569
+ <details><summary>Click to expand</summary>
570
+
571
+ - overwrite_output_dir: False
572
+ - do_predict: False
573
+ - prediction_loss_only: False
574
+ - per_device_train_batch_size: 64
575
+ - per_device_eval_batch_size: 64
576
+ - per_gpu_train_batch_size: None
577
+ - per_gpu_eval_batch_size: None
578
+ - gradient_accumulation_steps: 1
579
+ - eval_accumulation_steps: None
580
+ - learning_rate: 5e-05
581
+ - weight_decay: 0.0
582
+ - adam_beta1: 0.9
583
+ - adam_beta2: 0.999
584
+ - adam_epsilon: 1e-08
585
+ - max_grad_norm: 1
586
+ - num_train_epochs: 1
587
+ - max_steps: -1
588
+ - lr_scheduler_type: linear
589
+ - lr_scheduler_kwargs: {}
590
+ - warmup_ratio: 0.0
591
+ - warmup_steps: 0
592
+ - log_level: passive
593
+ - log_level_replica: warning
594
+ - log_on_each_node: True
595
+ - logging_nan_inf_filter: True
596
+ - save_safetensors: True
597
+ - save_on_each_node: False
598
+ - save_only_model: False
599
+ - no_cuda: False
600
+ - use_cpu: False
601
+ - use_mps_device: False
602
+ - seed: 42
603
+ - data_seed: None
604
+ - jit_mode_eval: False
605
+ - use_ipex: False
606
+ - bf16: False
607
+ - fp16: False
608
+ - fp16_opt_level: O1
609
+ - half_precision_backend: auto
610
+ - bf16_full_eval: False
611
+ - fp16_full_eval: False
612
+ - tf32: None
613
+ - local_rank: 0
614
+ - ddp_backend: None
615
+ - tpu_num_cores: None
616
+ - tpu_metrics_debug: False
617
+ - debug: []
618
+ - dataloader_drop_last: False
619
+ - dataloader_num_workers: 0
620
+ - dataloader_prefetch_factor: None
621
+ - past_index: -1
622
+ - disable_tqdm: False
623
+ - remove_unused_columns: True
624
+ - label_names: None
625
+ - load_best_model_at_end: False
626
+ - ignore_data_skip: False
627
+ - fsdp: []
628
+ - fsdp_min_num_params: 0
629
+ - fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
630
+ - fsdp_transformer_layer_cls_to_wrap: None
631
+ - accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True}
632
+ - deepspeed: None
633
+ - label_smoothing_factor: 0.0
634
+ - optim: adamw_torch
635
+ - optim_args: None
636
+ - adafactor: False
637
+ - group_by_length: False
638
+ - length_column_name: length
639
+ - ddp_find_unused_parameters: None
640
+ - ddp_bucket_cap_mb: None
641
+ - ddp_broadcast_buffers: None
642
+ - dataloader_pin_memory: True
643
+ - dataloader_persistent_workers: False
644
+ - skip_memory_metrics: True
645
+ - use_legacy_prediction_loop: False
646
+ - push_to_hub: False
647
+ - resume_from_checkpoint: None
648
+ - hub_model_id: None
649
+ - hub_strategy: every_save
650
+ - hub_private_repo: False
651
+ - hub_always_push: False
652
+ - gradient_checkpointing: False
653
+ - gradient_checkpointing_kwargs: None
654
+ - include_inputs_for_metrics: False
655
+ - fp16_backend: auto
656
+ - push_to_hub_model_id: None
657
+ - push_to_hub_organization: None
658
+ - mp_parameters:
659
+ - auto_find_batch_size: False
660
+ - full_determinism: False
661
+ - torchdynamo: None
662
+ - ray_scope: last
663
+ - ddp_timeout: 1800
664
+ - torch_compile: False
665
+ - torch_compile_backend: None
666
+ - torch_compile_mode: None
667
+ - dispatch_batches: None
668
+ - split_batches: None
669
+ - include_tokens_per_second: False
670
+ - include_num_input_tokens_seen: False
671
+ - neftune_noise_alpha: None
672
+ - optim_target_modules: None
673
+ - round_robin_sampler: True
674
+
675
+ </details>
676
+
677
+ ### Training Logs
678
+ | Epoch | Step | Training Loss | cosine_accuracy | cosine_map@100 | dev_average_precision |
679
+ |:------:|:----:|:-------------:|:---------------:|:--------------:|:---------------------:|
680
+ | 0 | 0 | - | 0.7661 | 0.9371 | 0.4137 |
681
+ | 0.1543 | 500 | 0.1055 | 0.7632 | 0.9620 | 0.4731 |
682
+ | 0.3086 | 1000 | 0.0677 | 0.7608 | 0.9675 | 0.4732 |
683
+ | 0.4630 | 1500 | 0.0612 | 0.7663 | 0.9710 | 0.4856 |
684
+ | 0.6173 | 2000 | 0.0584 | 0.7719 | 0.9693 | 0.4925 |
685
+ | 0.7716 | 2500 | 0.0506 | 0.7714 | 0.9709 | 0.4808 |
686
+ | 0.9259 | 3000 | 0.0488 | 0.7708 | 0.9713 | 0.4784 |
687
+ | 1.0 | 3240 | - | 0.7707 | 0.9714 | 0.4780 |
688
+
689
+
690
+ ### Framework Versions
691
+ - Python: 3.11.6
692
+ - Sentence Transformers: 2.7.0.dev0
693
+ - Transformers: 4.39.3
694
+ - PyTorch: 2.1.0+cu121
695
+ - Accelerate: 0.26.1
696
+ - Datasets: 2.18.0
697
+ - Tokenizers: 0.15.2
698
+
699
+ ## Citation
700
+
701
+ ### BibTeX
702
+
703
+ #### Sentence Transformers
704
+ ```bibtex
705
+ @inproceedings{reimers-2019-sentence-bert,
706
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
707
+ author = "Reimers, Nils and Gurevych, Iryna",
708
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
709
+ month = "11",
710
+ year = "2019",
711
+ publisher = "Association for Computational Linguistics",
712
+ url = "https://arxiv.org/abs/1908.10084",
713
+ }
714
+ ```
715
+
716
+ #### MultipleNegativesRankingLoss
717
+ ```bibtex
718
+ @misc{henderson2017efficient,
719
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
720
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
721
+ year={2017},
722
+ eprint={1705.00652},
723
+ archivePrefix={arXiv},
724
+ primaryClass={cs.CL}
725
+ }
726
+ ```
727
+
728
+ <!--
729
+ ## Glossary
730
+
731
+ *Clearly define terms in order to be accessible across audiences.*
732
+ -->
733
+
734
+ <!--
735
+ ## Model Card Authors
736
+
737
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
738
+ -->
739
+
740
+ <!--
741
+ ## Model Card Contact
742
+
743
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
744
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/stsb-distilbert-base",
3
+ "activation": "gelu",
4
+ "architectures": [
5
+ "DistilBertModel"
6
+ ],
7
+ "attention_dropout": 0.1,
8
+ "dim": 768,
9
+ "dropout": 0.1,
10
+ "hidden_dim": 3072,
11
+ "initializer_range": 0.02,
12
+ "max_position_embeddings": 512,
13
+ "model_type": "distilbert",
14
+ "n_heads": 12,
15
+ "n_layers": 6,
16
+ "pad_token_id": 0,
17
+ "qa_dropout": 0.1,
18
+ "seq_classif_dropout": 0.2,
19
+ "sinusoidal_pos_embds": false,
20
+ "tie_weights_": true,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.39.3",
23
+ "vocab_size": 30522
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.0.0",
4
+ "transformers": "4.7.0",
5
+ "pytorch": "1.9.0+cu102"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null
9
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:475763bf8eb15d61532a98f946ae6b2933a661a4bf0a2bf84299ec760659ab05
3
+ size 265462608
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "full_tokenizer_file": null,
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 128,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "DistilBertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff