bobox commited on
Commit
423799a
1 Parent(s): 2417943

continued training on 200k:400k + 100k:200k

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,642 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: []
3
+ library_name: sentence-transformers
4
+ tags:
5
+ - sentence-transformers
6
+ - sentence-similarity
7
+ - feature-extraction
8
+ - generated_from_trainer
9
+ - dataset_size:700000
10
+ - loss:DenoisingAutoEncoderLoss
11
+ base_model: intfloat/e5-base-unsupervised
12
+ datasets: []
13
+ metrics:
14
+ - pearson_cosine
15
+ - spearman_cosine
16
+ - pearson_manhattan
17
+ - spearman_manhattan
18
+ - pearson_euclidean
19
+ - spearman_euclidean
20
+ - pearson_dot
21
+ - spearman_dot
22
+ - pearson_max
23
+ - spearman_max
24
+ widget:
25
+ - source_sentence: in Freeview no extra therefore minimal Also the is wide decent,
26
+ plus they and.
27
+ sentences:
28
+ - 'Pokémon-GX (Japanese: ポケモンGX Pokémon GX), officially written Pokémon-GX, are
29
+ a variant of Pokémon in the Pokémon Trading Card Game. They were first introduced
30
+ in the Sun & Moon expansion (the Collection Sun and Collection Moon expansions
31
+ in Japan). Pokémon-GX have a stylized. graphic on the card name.'
32
+ - 'The Cape Colony (Dutch: Kaapkolonie) was a Dutch East India Company colony in
33
+ Southern Africa, centered on the Cape of Good Hope, whence it derived its name.
34
+ The original colony and its successive states that the colony was incorporated
35
+ into occupied much of modern South Africa.'
36
+ - Avtex is expensive, but you get built in Freeview, Freesat and built in DVD player,
37
+ which means no extra boxes, and therefore minimal wiring. Also the viewing angle
38
+ is wide and a decent picture quality, plus they are light and designed for mobile
39
+ use.
40
+ - source_sentence: as power Yes can use transmission of power steering But, sure you
41
+ check the manufacturer's the a
42
+ sentences:
43
+ - Can you use transmission fluid as a substitute for power steering fluid? Yes,
44
+ you can use transmission fluid in place of a power steering fluid. But, make sure
45
+ you check the car manufacturer's recommendations before using the ATF as a substitute.
46
+ - how much kwh does an xbox one use?
47
+ - what is the difference between demerara cane sugar and turbinado cane sugar?
48
+ - source_sentence: '(number ''Step: Ensure date to (and number is set Date 2 formula
49
+ to add the number months start.'''
50
+ sentences:
51
+ - Being a medical doctor is really great. It's stimulating and interesting. Medical
52
+ doctors have a significant degree of autonomy over their schedules and time. Medical
53
+ doctors know that they get to help people solve problems every single day.
54
+ - how much is an air conditioner for a house?
55
+ - '[''=EDATE(start date, number of months)'', ''Step 1: Ensure the starting date
56
+ is properly formatted – go to Format Cells (press Ctrl + 1) and make sure the
57
+ number is set to Date.'', ''Step 2: Use the =EDATE(C3,C5) formula to add the number
58
+ of specified months to the start date.'']'
59
+ - source_sentence: many days can
60
+ sentences:
61
+ - how many days after can you have morning after pill?
62
+ - is gender an independent variable?
63
+ - The current standard is about 30 days, which means that some teachers and support
64
+ staff may be brought on board before the results of their criminal background
65
+ check are completed. The issue, as reported in this article, is the lag time between
66
+ state and federal background checks.
67
+ - source_sentence: ligand ion channels located?
68
+ sentences:
69
+ - Share on Pinterest Recent research suggests that chocolate may have some health
70
+ benefits. Chocolate receives a lot of bad press because of its high fat and sugar
71
+ content. Its consumption has been associated with acne, obesity, high blood pressure,
72
+ coronary artery disease, and diabetes.
73
+ - where are ligand gated ion channels located?
74
+ - Duvets tend to be warm but surprisingly lightweight. The duvet cover makes it
75
+ easier to change bedding looks and styles. You won't need to wash your duvet very
76
+ often, just wash the cover regularly. Additionally, duvets tend to be fluffier
77
+ than comforters, and can simplify bed making if you choose the European style.
78
+ pipeline_tag: sentence-similarity
79
+ model-index:
80
+ - name: SentenceTransformer based on intfloat/e5-base-unsupervised
81
+ results:
82
+ - task:
83
+ type: semantic-similarity
84
+ name: Semantic Similarity
85
+ dataset:
86
+ name: sts test
87
+ type: sts-test
88
+ metrics:
89
+ - type: pearson_cosine
90
+ value: 0.7651793859211248
91
+ name: Pearson Cosine
92
+ - type: spearman_cosine
93
+ value: 0.7524804428249002
94
+ name: Spearman Cosine
95
+ - type: pearson_manhattan
96
+ value: 0.7393361318996702
97
+ name: Pearson Manhattan
98
+ - type: spearman_manhattan
99
+ value: 0.7326262473219208
100
+ name: Spearman Manhattan
101
+ - type: pearson_euclidean
102
+ value: 0.7402295162714656
103
+ name: Pearson Euclidean
104
+ - type: spearman_euclidean
105
+ value: 0.7335305408258518
106
+ name: Spearman Euclidean
107
+ - type: pearson_dot
108
+ value: 0.5002878735642248
109
+ name: Pearson Dot
110
+ - type: spearman_dot
111
+ value: 0.4986010870846151
112
+ name: Spearman Dot
113
+ - type: pearson_max
114
+ value: 0.7651793859211248
115
+ name: Pearson Max
116
+ - type: spearman_max
117
+ value: 0.7524804428249002
118
+ name: Spearman Max
119
+ ---
120
+
121
+ # SentenceTransformer based on intfloat/e5-base-unsupervised
122
+
123
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/e5-base-unsupervised](https://huggingface.co/intfloat/e5-base-unsupervised). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
124
+
125
+ ## Model Details
126
+
127
+ ### Model Description
128
+ - **Model Type:** Sentence Transformer
129
+ - **Base model:** [intfloat/e5-base-unsupervised](https://huggingface.co/intfloat/e5-base-unsupervised) <!-- at revision 6003a5b7ce770b0549203e41115b9fc683f16dad -->
130
+ - **Maximum Sequence Length:** 512 tokens
131
+ - **Output Dimensionality:** 768 tokens
132
+ - **Similarity Function:** Cosine Similarity
133
+ <!-- - **Training Dataset:** Unknown -->
134
+ <!-- - **Language:** Unknown -->
135
+ <!-- - **License:** Unknown -->
136
+
137
+ ### Model Sources
138
+
139
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
140
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
141
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
142
+
143
+ ### Full Model Architecture
144
+
145
+ ```
146
+ SentenceTransformer(
147
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
148
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
149
+ )
150
+ ```
151
+
152
+ ## Usage
153
+
154
+ ### Direct Usage (Sentence Transformers)
155
+
156
+ First install the Sentence Transformers library:
157
+
158
+ ```bash
159
+ pip install -U sentence-transformers
160
+ ```
161
+
162
+ Then you can load this model and run inference.
163
+ ```python
164
+ from sentence_transformers import SentenceTransformer
165
+
166
+ # Download from the 🤗 Hub
167
+ model = SentenceTransformer("bobox/E5-base-unsupervised-TSDAE-2")
168
+ # Run inference
169
+ sentences = [
170
+ 'ligand ion channels located?',
171
+ 'where are ligand gated ion channels located?',
172
+ "Duvets tend to be warm but surprisingly lightweight. The duvet cover makes it easier to change bedding looks and styles. You won't need to wash your duvet very often, just wash the cover regularly. Additionally, duvets tend to be fluffier than comforters, and can simplify bed making if you choose the European style.",
173
+ ]
174
+ embeddings = model.encode(sentences)
175
+ print(embeddings.shape)
176
+ # [3, 768]
177
+
178
+ # Get the similarity scores for the embeddings
179
+ similarities = model.similarity(embeddings, embeddings)
180
+ print(similarities.shape)
181
+ # [3, 3]
182
+ ```
183
+
184
+ <!--
185
+ ### Direct Usage (Transformers)
186
+
187
+ <details><summary>Click to see the direct usage in Transformers</summary>
188
+
189
+ </details>
190
+ -->
191
+
192
+ <!--
193
+ ### Downstream Usage (Sentence Transformers)
194
+
195
+ You can finetune this model on your own dataset.
196
+
197
+ <details><summary>Click to expand</summary>
198
+
199
+ </details>
200
+ -->
201
+
202
+ <!--
203
+ ### Out-of-Scope Use
204
+
205
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
206
+ -->
207
+
208
+ ## Evaluation
209
+
210
+ ### Metrics
211
+
212
+ #### Semantic Similarity
213
+ * Dataset: `sts-test`
214
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
215
+
216
+ | Metric | Value |
217
+ |:--------------------|:-----------|
218
+ | pearson_cosine | 0.7652 |
219
+ | **spearman_cosine** | **0.7525** |
220
+ | pearson_manhattan | 0.7393 |
221
+ | spearman_manhattan | 0.7326 |
222
+ | pearson_euclidean | 0.7402 |
223
+ | spearman_euclidean | 0.7335 |
224
+ | pearson_dot | 0.5003 |
225
+ | spearman_dot | 0.4986 |
226
+ | pearson_max | 0.7652 |
227
+ | spearman_max | 0.7525 |
228
+
229
+ <!--
230
+ ## Bias, Risks and Limitations
231
+
232
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
233
+ -->
234
+
235
+ <!--
236
+ ### Recommendations
237
+
238
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
239
+ -->
240
+
241
+ ## Training Details
242
+
243
+ ### Training Dataset
244
+
245
+ #### Unnamed Dataset
246
+
247
+
248
+ * Size: 700,000 training samples
249
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
250
+ * Approximate statistics based on the first 1000 samples:
251
+ | | sentence_0 | sentence_1 |
252
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
253
+ | type | string | string |
254
+ | details | <ul><li>min: 3 tokens</li><li>mean: 15.73 tokens</li><li>max: 55 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 36.05 tokens</li><li>max: 131 tokens</li></ul> |
255
+ * Samples:
256
+ | sentence_0 | sentence_1 |
257
+ |:---------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
258
+ | <code>Quality such a has components with applicable high objective system measure component improvements</code> | <code>Quality in such a system has three components: high accuracy, compliance with applicable standards, and high customer satisfaction. The objective of the system is to measure each component and achieve improvements.</code> |
259
+ | <code>include</code> | <code>does qbi include capital gains?</code> |
260
+ | <code>They have a . parietal is in, as becomes and pigments after four to is believed and in circadian cycles</code> | <code>They have a third eye. The parietal eye is only visible in hatchlings, as it becomes covered in scales and pigments after four to six months. Its function is a subject of ongoing research, but it is believed to be useful in absorbing ultraviolet rays and in setting circadian and seasonal cycles.</code> |
261
+ * Loss: [<code>DenoisingAutoEncoderLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#denoisingautoencoderloss)
262
+
263
+ ### Training Hyperparameters
264
+ #### Non-Default Hyperparameters
265
+
266
+ - `eval_strategy`: steps
267
+ - `per_device_train_batch_size`: 16
268
+ - `per_device_eval_batch_size`: 16
269
+ - `num_train_epochs`: 2
270
+ - `multi_dataset_batch_sampler`: round_robin
271
+
272
+ #### All Hyperparameters
273
+ <details><summary>Click to expand</summary>
274
+
275
+ - `overwrite_output_dir`: False
276
+ - `do_predict`: False
277
+ - `eval_strategy`: steps
278
+ - `prediction_loss_only`: True
279
+ - `per_device_train_batch_size`: 16
280
+ - `per_device_eval_batch_size`: 16
281
+ - `per_gpu_train_batch_size`: None
282
+ - `per_gpu_eval_batch_size`: None
283
+ - `gradient_accumulation_steps`: 1
284
+ - `eval_accumulation_steps`: None
285
+ - `learning_rate`: 5e-05
286
+ - `weight_decay`: 0.0
287
+ - `adam_beta1`: 0.9
288
+ - `adam_beta2`: 0.999
289
+ - `adam_epsilon`: 1e-08
290
+ - `max_grad_norm`: 1
291
+ - `num_train_epochs`: 2
292
+ - `max_steps`: -1
293
+ - `lr_scheduler_type`: linear
294
+ - `lr_scheduler_kwargs`: {}
295
+ - `warmup_ratio`: 0.0
296
+ - `warmup_steps`: 0
297
+ - `log_level`: passive
298
+ - `log_level_replica`: warning
299
+ - `log_on_each_node`: True
300
+ - `logging_nan_inf_filter`: True
301
+ - `save_safetensors`: True
302
+ - `save_on_each_node`: False
303
+ - `save_only_model`: False
304
+ - `restore_callback_states_from_checkpoint`: False
305
+ - `no_cuda`: False
306
+ - `use_cpu`: False
307
+ - `use_mps_device`: False
308
+ - `seed`: 42
309
+ - `data_seed`: None
310
+ - `jit_mode_eval`: False
311
+ - `use_ipex`: False
312
+ - `bf16`: False
313
+ - `fp16`: False
314
+ - `fp16_opt_level`: O1
315
+ - `half_precision_backend`: auto
316
+ - `bf16_full_eval`: False
317
+ - `fp16_full_eval`: False
318
+ - `tf32`: None
319
+ - `local_rank`: 0
320
+ - `ddp_backend`: None
321
+ - `tpu_num_cores`: None
322
+ - `tpu_metrics_debug`: False
323
+ - `debug`: []
324
+ - `dataloader_drop_last`: False
325
+ - `dataloader_num_workers`: 0
326
+ - `dataloader_prefetch_factor`: None
327
+ - `past_index`: -1
328
+ - `disable_tqdm`: False
329
+ - `remove_unused_columns`: True
330
+ - `label_names`: None
331
+ - `load_best_model_at_end`: False
332
+ - `ignore_data_skip`: False
333
+ - `fsdp`: []
334
+ - `fsdp_min_num_params`: 0
335
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
336
+ - `fsdp_transformer_layer_cls_to_wrap`: None
337
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
338
+ - `deepspeed`: None
339
+ - `label_smoothing_factor`: 0.0
340
+ - `optim`: adamw_torch
341
+ - `optim_args`: None
342
+ - `adafactor`: False
343
+ - `group_by_length`: False
344
+ - `length_column_name`: length
345
+ - `ddp_find_unused_parameters`: None
346
+ - `ddp_bucket_cap_mb`: None
347
+ - `ddp_broadcast_buffers`: False
348
+ - `dataloader_pin_memory`: True
349
+ - `dataloader_persistent_workers`: False
350
+ - `skip_memory_metrics`: True
351
+ - `use_legacy_prediction_loop`: False
352
+ - `push_to_hub`: False
353
+ - `resume_from_checkpoint`: None
354
+ - `hub_model_id`: None
355
+ - `hub_strategy`: every_save
356
+ - `hub_private_repo`: False
357
+ - `hub_always_push`: False
358
+ - `gradient_checkpointing`: False
359
+ - `gradient_checkpointing_kwargs`: None
360
+ - `include_inputs_for_metrics`: False
361
+ - `eval_do_concat_batches`: True
362
+ - `fp16_backend`: auto
363
+ - `push_to_hub_model_id`: None
364
+ - `push_to_hub_organization`: None
365
+ - `mp_parameters`:
366
+ - `auto_find_batch_size`: False
367
+ - `full_determinism`: False
368
+ - `torchdynamo`: None
369
+ - `ray_scope`: last
370
+ - `ddp_timeout`: 1800
371
+ - `torch_compile`: False
372
+ - `torch_compile_backend`: None
373
+ - `torch_compile_mode`: None
374
+ - `dispatch_batches`: None
375
+ - `split_batches`: None
376
+ - `include_tokens_per_second`: False
377
+ - `include_num_input_tokens_seen`: False
378
+ - `neftune_noise_alpha`: None
379
+ - `optim_target_modules`: None
380
+ - `batch_eval_metrics`: False
381
+ - `batch_sampler`: batch_sampler
382
+ - `multi_dataset_batch_sampler`: round_robin
383
+
384
+ </details>
385
+
386
+ ### Training Logs
387
+ <details><summary>Click to expand</summary>
388
+
389
+ | Epoch | Step | Training Loss | sts-test_spearman_cosine |
390
+ |:------:|:-----:|:-------------:|:------------------------:|
391
+ | 0 | 0 | - | 0.7211 |
392
+ | 0.0114 | 500 | 9.4957 | - |
393
+ | 0.0229 | 1000 | 7.4063 | - |
394
+ | 0.0343 | 1500 | 7.0225 | - |
395
+ | 0.0457 | 2000 | 6.6991 | - |
396
+ | 0.0571 | 2500 | 6.4054 | - |
397
+ | 0.0686 | 3000 | 6.1933 | - |
398
+ | 0.08 | 3500 | 5.999 | - |
399
+ | 0.0914 | 4000 | 5.8471 | - |
400
+ | 0.1 | 4375 | - | 0.4610 |
401
+ | 0.1029 | 4500 | 5.6876 | - |
402
+ | 0.1143 | 5000 | 5.5934 | - |
403
+ | 0.1257 | 5500 | 5.4877 | - |
404
+ | 0.1371 | 6000 | 5.4034 | - |
405
+ | 0.1486 | 6500 | 5.3016 | - |
406
+ | 0.16 | 7000 | 5.2169 | - |
407
+ | 0.1714 | 7500 | 5.1351 | - |
408
+ | 0.1829 | 8000 | 5.0605 | - |
409
+ | 0.1943 | 8500 | 4.9851 | - |
410
+ | 0.2 | 8750 | - | 0.6490 |
411
+ | 0.2057 | 9000 | 4.9024 | - |
412
+ | 0.2171 | 9500 | 4.8722 | - |
413
+ | 0.2286 | 10000 | 4.7955 | - |
414
+ | 0.24 | 10500 | 4.7435 | - |
415
+ | 0.2514 | 11000 | 4.6742 | - |
416
+ | 0.2629 | 11500 | 4.6447 | - |
417
+ | 0.2743 | 12000 | 4.5964 | - |
418
+ | 0.2857 | 12500 | 4.5186 | - |
419
+ | 0.2971 | 13000 | 4.5024 | - |
420
+ | 0.3 | 13125 | - | 0.7121 |
421
+ | 0.3086 | 13500 | 4.4336 | - |
422
+ | 0.32 | 14000 | 4.3767 | - |
423
+ | 0.3314 | 14500 | 4.3454 | - |
424
+ | 0.3429 | 15000 | 4.3067 | - |
425
+ | 0.3543 | 15500 | 4.2627 | - |
426
+ | 0.3657 | 16000 | 4.2323 | - |
427
+ | 0.3771 | 16500 | 4.208 | - |
428
+ | 0.3886 | 17000 | 4.1622 | - |
429
+ | 0.4 | 17500 | 4.113 | 0.7375 |
430
+ | 0.4114 | 18000 | 4.1097 | - |
431
+ | 0.4229 | 18500 | 4.0666 | - |
432
+ | 0.4343 | 19000 | 4.0311 | - |
433
+ | 0.4457 | 19500 | 4.0241 | - |
434
+ | 0.4571 | 20000 | 3.9991 | - |
435
+ | 0.4686 | 20500 | 3.9873 | - |
436
+ | 0.48 | 21000 | 3.9439 | - |
437
+ | 0.4914 | 21500 | 3.9281 | - |
438
+ | 0.5 | 21875 | - | 0.7502 |
439
+ | 0.5029 | 22000 | 3.9047 | - |
440
+ | 0.5143 | 22500 | 3.89 | - |
441
+ | 0.5257 | 23000 | 3.8671 | - |
442
+ | 0.5371 | 23500 | 3.85 | - |
443
+ | 0.5486 | 24000 | 3.8336 | - |
444
+ | 0.56 | 24500 | 3.8081 | - |
445
+ | 0.5714 | 25000 | 3.8049 | - |
446
+ | 0.5829 | 25500 | 3.7587 | - |
447
+ | 0.5943 | 26000 | 3.769 | - |
448
+ | 0.6 | 26250 | - | 0.7530 |
449
+ | 0.6057 | 26500 | 3.7488 | - |
450
+ | 0.6171 | 27000 | 3.7218 | - |
451
+ | 0.6286 | 27500 | 3.7128 | - |
452
+ | 0.64 | 28000 | 3.7104 | - |
453
+ | 0.6514 | 28500 | 3.6706 | - |
454
+ | 0.6629 | 29000 | 3.6602 | - |
455
+ | 0.6743 | 29500 | 3.658 | - |
456
+ | 0.6857 | 30000 | 3.665 | - |
457
+ | 0.6971 | 30500 | 3.6439 | - |
458
+ | 0.7 | 30625 | - | 0.7561 |
459
+ | 0.7086 | 31000 | 3.6411 | - |
460
+ | 0.72 | 31500 | 3.6141 | - |
461
+ | 0.7314 | 32000 | 3.6172 | - |
462
+ | 0.7429 | 32500 | 3.5975 | - |
463
+ | 0.7543 | 33000 | 3.5827 | - |
464
+ | 0.7657 | 33500 | 3.5836 | - |
465
+ | 0.7771 | 34000 | 3.5484 | - |
466
+ | 0.7886 | 34500 | 3.5275 | - |
467
+ | 0.8 | 35000 | 3.5587 | 0.7553 |
468
+ | 0.8114 | 35500 | 3.5371 | - |
469
+ | 0.8229 | 36000 | 3.5334 | - |
470
+ | 0.8343 | 36500 | 3.5168 | - |
471
+ | 0.8457 | 37000 | 3.483 | - |
472
+ | 0.8571 | 37500 | 3.4755 | - |
473
+ | 0.8686 | 38000 | 3.4943 | - |
474
+ | 0.88 | 38500 | 3.4699 | - |
475
+ | 0.8914 | 39000 | 3.4732 | - |
476
+ | 0.9 | 39375 | - | 0.7560 |
477
+ | 0.9029 | 39500 | 3.4572 | - |
478
+ | 0.9143 | 40000 | 3.4518 | - |
479
+ | 0.9257 | 40500 | 3.4298 | - |
480
+ | 0.9371 | 41000 | 3.4215 | - |
481
+ | 0.9486 | 41500 | 3.4176 | - |
482
+ | 0.96 | 42000 | 3.4353 | - |
483
+ | 0.9714 | 42500 | 3.4137 | - |
484
+ | 0.9829 | 43000 | 3.4037 | - |
485
+ | 0.9943 | 43500 | 3.4157 | - |
486
+ | 1.0 | 43750 | - | 0.7554 |
487
+ | 1.0057 | 44000 | 3.393 | - |
488
+ | 1.0171 | 44500 | 3.4092 | - |
489
+ | 1.0286 | 45000 | 3.3861 | - |
490
+ | 1.04 | 45500 | 3.3976 | - |
491
+ | 1.0514 | 46000 | 3.3769 | - |
492
+ | 1.0629 | 46500 | 3.3444 | - |
493
+ | 1.0743 | 47000 | 3.3598 | - |
494
+ | 1.0857 | 47500 | 3.3556 | - |
495
+ | 1.0971 | 48000 | 3.3548 | - |
496
+ | 1.1 | 48125 | - | 0.7549 |
497
+ | 1.1086 | 48500 | 3.3278 | - |
498
+ | 1.12 | 49000 | 3.3309 | - |
499
+ | 1.1314 | 49500 | 3.3459 | - |
500
+ | 1.1429 | 50000 | 3.3353 | - |
501
+ | 1.1543 | 50500 | 3.3192 | - |
502
+ | 1.1657 | 51000 | 3.3022 | - |
503
+ | 1.1771 | 51500 | 3.3189 | - |
504
+ | 1.1886 | 52000 | 3.301 | - |
505
+ | 1.2 | 52500 | 3.2785 | 0.7542 |
506
+ | 1.2114 | 53000 | 3.2996 | - |
507
+ | 1.2229 | 53500 | 3.2863 | - |
508
+ | 1.2343 | 54000 | 3.2916 | - |
509
+ | 1.2457 | 54500 | 3.272 | - |
510
+ | 1.2571 | 55000 | 3.2896 | - |
511
+ | 1.2686 | 55500 | 3.2694 | - |
512
+ | 1.28 | 56000 | 3.2848 | - |
513
+ | 1.2914 | 56500 | 3.2528 | - |
514
+ | 1.3 | 56875 | - | 0.7554 |
515
+ | 1.3029 | 57000 | 3.2622 | - |
516
+ | 1.3143 | 57500 | 3.2515 | - |
517
+ | 1.3257 | 58000 | 3.2385 | - |
518
+ | 1.3371 | 58500 | 3.2341 | - |
519
+ | 1.3486 | 59000 | 3.2275 | - |
520
+ | 1.3600 | 59500 | 3.2538 | - |
521
+ | 1.3714 | 60000 | 3.2329 | - |
522
+ | 1.3829 | 60500 | 3.2322 | - |
523
+ | 1.3943 | 61000 | 3.2039 | - |
524
+ | 1.4 | 61250 | - | 0.7530 |
525
+ | 1.4057 | 61500 | 3.212 | - |
526
+ | 1.4171 | 62000 | 3.2127 | - |
527
+ | 1.4286 | 62500 | 3.1956 | - |
528
+ | 1.44 | 63000 | 3.202 | - |
529
+ | 1.4514 | 63500 | 3.2046 | - |
530
+ | 1.4629 | 64000 | 3.2105 | - |
531
+ | 1.4743 | 64500 | 3.1915 | - |
532
+ | 1.4857 | 65000 | 3.176 | - |
533
+ | 1.4971 | 65500 | 3.1852 | - |
534
+ | 1.5 | 65625 | - | 0.7541 |
535
+ | 1.5086 | 66000 | 3.1988 | - |
536
+ | 1.52 | 66500 | 3.1714 | - |
537
+ | 1.5314 | 67000 | 3.1816 | - |
538
+ | 1.5429 | 67500 | 3.1745 | - |
539
+ | 1.5543 | 68000 | 3.1674 | - |
540
+ | 1.5657 | 68500 | 3.1887 | - |
541
+ | 1.5771 | 69000 | 3.1567 | - |
542
+ | 1.5886 | 69500 | 3.1775 | - |
543
+ | 1.6 | 70000 | 3.1696 | 0.7535 |
544
+ | 1.6114 | 70500 | 3.154 | - |
545
+ | 1.6229 | 71000 | 3.1553 | - |
546
+ | 1.6343 | 71500 | 3.1675 | - |
547
+ | 1.6457 | 72000 | 3.1516 | - |
548
+ | 1.6571 | 72500 | 3.1569 | - |
549
+ | 1.6686 | 73000 | 3.1403 | - |
550
+ | 1.6800 | 73500 | 3.1667 | - |
551
+ | 1.6914 | 74000 | 3.1545 | - |
552
+ | 1.7 | 74375 | - | 0.7529 |
553
+ | 1.7029 | 74500 | 3.1736 | - |
554
+ | 1.7143 | 75000 | 3.1447 | - |
555
+ | 1.7257 | 75500 | 3.1567 | - |
556
+ | 1.7371 | 76000 | 3.1682 | - |
557
+ | 1.7486 | 76500 | 3.149 | - |
558
+ | 1.76 | 77000 | 3.1522 | - |
559
+ | 1.7714 | 77500 | 3.1412 | - |
560
+ | 1.7829 | 78000 | 3.1268 | - |
561
+ | 1.7943 | 78500 | 3.1476 | - |
562
+ | 1.8 | 78750 | - | 0.7524 |
563
+ | 1.8057 | 79000 | 3.1669 | - |
564
+ | 1.8171 | 79500 | 3.1432 | - |
565
+ | 1.8286 | 80000 | 3.1603 | - |
566
+ | 1.8400 | 80500 | 3.1347 | - |
567
+ | 1.8514 | 81000 | 3.1209 | - |
568
+ | 1.8629 | 81500 | 3.1302 | - |
569
+ | 1.8743 | 82000 | 3.1423 | - |
570
+ | 1.8857 | 82500 | 3.1481 | - |
571
+ | 1.8971 | 83000 | 3.1262 | - |
572
+ | 1.9 | 83125 | - | 0.7525 |
573
+ | 1.9086 | 83500 | 3.1484 | - |
574
+ | 1.92 | 84000 | 3.1331 | - |
575
+ | 1.9314 | 84500 | 3.122 | - |
576
+ | 1.9429 | 85000 | 3.1272 | - |
577
+ | 1.9543 | 85500 | 3.1435 | - |
578
+ | 1.9657 | 86000 | 3.1431 | - |
579
+ | 1.9771 | 86500 | 3.1457 | - |
580
+ | 1.9886 | 87000 | 3.1286 | - |
581
+ | 2.0 | 87500 | 3.1352 | 0.7525 |
582
+
583
+ </details>
584
+
585
+ ### Framework Versions
586
+ - Python: 3.10.13
587
+ - Sentence Transformers: 3.0.1
588
+ - Transformers: 4.41.2
589
+ - PyTorch: 2.1.2
590
+ - Accelerate: 0.31.0
591
+ - Datasets: 2.19.2
592
+ - Tokenizers: 0.19.1
593
+
594
+ ## Citation
595
+
596
+ ### BibTeX
597
+
598
+ #### Sentence Transformers
599
+ ```bibtex
600
+ @inproceedings{reimers-2019-sentence-bert,
601
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
602
+ author = "Reimers, Nils and Gurevych, Iryna",
603
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
604
+ month = "11",
605
+ year = "2019",
606
+ publisher = "Association for Computational Linguistics",
607
+ url = "https://arxiv.org/abs/1908.10084",
608
+ }
609
+ ```
610
+
611
+ #### DenoisingAutoEncoderLoss
612
+ ```bibtex
613
+ @inproceedings{wang-2021-TSDAE,
614
+ title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
615
+ author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna",
616
+ booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
617
+ month = nov,
618
+ year = "2021",
619
+ address = "Punta Cana, Dominican Republic",
620
+ publisher = "Association for Computational Linguistics",
621
+ pages = "671--688",
622
+ url = "https://arxiv.org/abs/2104.06979",
623
+ }
624
+ ```
625
+
626
+ <!--
627
+ ## Glossary
628
+
629
+ *Clearly define terms in order to be accessible across audiences.*
630
+ -->
631
+
632
+ <!--
633
+ ## Model Card Authors
634
+
635
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
636
+ -->
637
+
638
+ <!--
639
+ ## Model Card Contact
640
+
641
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
642
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "intfloat/e5-base-unsupervised",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.41.2",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.1.2"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:181bab80dfbf321225f3ae2ad0c2bd8cc8b9734c92fba60057295f12b5269c03
3
+ size 437996134
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "mask_token": "[MASK]",
48
+ "model_max_length": 512,
49
+ "pad_token": "[PAD]",
50
+ "sep_token": "[SEP]",
51
+ "strip_accents": null,
52
+ "tokenize_chinese_chars": true,
53
+ "tokenizer_class": "BertTokenizer",
54
+ "unk_token": "[UNK]"
55
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff