cassador commited on
Commit
5734bb2
1 Parent(s): de1d522

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,793 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - id
4
+ library_name: sentence-transformers
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:10000
11
+ - loss:SoftmaxLoss
12
+ base_model: indobenchmark/indobert-base-p2
13
+ datasets:
14
+ - afaji/indonli
15
+ metrics:
16
+ - pearson_cosine
17
+ - spearman_cosine
18
+ - pearson_manhattan
19
+ - spearman_manhattan
20
+ - pearson_euclidean
21
+ - spearman_euclidean
22
+ - pearson_dot
23
+ - spearman_dot
24
+ - pearson_max
25
+ - spearman_max
26
+ widget:
27
+ - source_sentence: '"Berbagai macam jenis minuman sehat untuk mengembalikan ion ataupun
28
+ mengandung vitamin, dapat kita temui dengan mudah di sekitar."'
29
+ sentences:
30
+ - Moody's tidak memiliki metrik peringkat untuk penerbit sekuritas yang dikenai
31
+ pajak.
32
+ - Lupa olahraga adalah alasan yang selalu digunakan untuk tak berolahraga.
33
+ - Minuman sehat sulit ditemui.
34
+ - source_sentence: Mayweather menepis anggapan bahwa McGregor yang merupakan petarung
35
+ kidal mungkin menyebabkan masalah baginya.
36
+ sentences:
37
+ - Cimahi Selatan merupakan sebuah Kecamatan di Kota Cimahi.
38
+ - Masyarakat umum dilibatkan untuk memberikan respon dalam acara dengar pendapat
39
+ CRTC.
40
+ - McGregor dan Mayweather pernah bertarung dengan sengit.
41
+ - source_sentence: Wonosobo adalah salah satu kabupaten yang terdapat di Provinsi
42
+ Jawa Tengah.
43
+ sentences:
44
+ - Tidak terdapat kabupaten di Provinsi Jawa Tengah.
45
+ - Nogizaka46 sekarang sudah merilis 25 singel.
46
+ - Joko Driyono adalah Wakil Ketua Umum PSSI.
47
+ - source_sentence: Bangunan ini digunakan untuk penjualan berbagai material. '
48
+ sentences:
49
+ - Istri bisa mengidamkan makanan yang mudah dicari.
50
+ - Saluran telepon tidak digunakan oleh FastNet dalam menyediakan akses internet.
51
+ - Bangunan ini digunakan untuk penjualan.
52
+ - source_sentence: Set album musik pengiring seri film Harry Potter akan dirilis dalam
53
+ versi baru.
54
+ sentences:
55
+ - Seri film Harry Potter memiliki set album musik pengiring.
56
+ - Daya tahan tubuh bayi tidak terjaga walaupun diberi ASI.
57
+ - Laga dan kolosal adalah genre film.
58
+ pipeline_tag: sentence-similarity
59
+ model-index:
60
+ - name: SentenceTransformer based on indobenchmark/indobert-base-p2
61
+ results:
62
+ - task:
63
+ type: semantic-similarity
64
+ name: Semantic Similarity
65
+ dataset:
66
+ name: sts dev
67
+ type: sts-dev
68
+ metrics:
69
+ - type: pearson_cosine
70
+ value: 0.3021139089985203
71
+ name: Pearson Cosine
72
+ - type: spearman_cosine
73
+ value: 0.30301169986128346
74
+ name: Spearman Cosine
75
+ - type: pearson_manhattan
76
+ value: 0.2767840491173264
77
+ name: Pearson Manhattan
78
+ - type: spearman_manhattan
79
+ value: 0.2725949754810958
80
+ name: Spearman Manhattan
81
+ - type: pearson_euclidean
82
+ value: 0.3071661849384816
83
+ name: Pearson Euclidean
84
+ - type: spearman_euclidean
85
+ value: 0.3044966278223258
86
+ name: Spearman Euclidean
87
+ - type: pearson_dot
88
+ value: 0.3039090779569512
89
+ name: Pearson Dot
90
+ - type: spearman_dot
91
+ value: 0.3047234168200123
92
+ name: Spearman Dot
93
+ - type: pearson_max
94
+ value: 0.3071661849384816
95
+ name: Pearson Max
96
+ - type: spearman_max
97
+ value: 0.3047234168200123
98
+ name: Spearman Max
99
+ - task:
100
+ type: semantic-similarity
101
+ name: Semantic Similarity
102
+ dataset:
103
+ name: sts test
104
+ type: sts-test
105
+ metrics:
106
+ - type: pearson_cosine
107
+ value: 0.10382066164158449
108
+ name: Pearson Cosine
109
+ - type: spearman_cosine
110
+ value: 0.09693567465932618
111
+ name: Spearman Cosine
112
+ - type: pearson_manhattan
113
+ value: 0.07492996229311771
114
+ name: Pearson Manhattan
115
+ - type: spearman_manhattan
116
+ value: 0.07823414156216839
117
+ name: Spearman Manhattan
118
+ - type: pearson_euclidean
119
+ value: 0.09422022261567607
120
+ name: Pearson Euclidean
121
+ - type: spearman_euclidean
122
+ value: 0.09902189422521299
123
+ name: Spearman Euclidean
124
+ - type: pearson_dot
125
+ value: 0.10695495102872325
126
+ name: Pearson Dot
127
+ - type: spearman_dot
128
+ value: 0.09978448101169902
129
+ name: Spearman Dot
130
+ - type: pearson_max
131
+ value: 0.10695495102872325
132
+ name: Pearson Max
133
+ - type: spearman_max
134
+ value: 0.09978448101169902
135
+ name: Spearman Max
136
+ ---
137
+
138
+ # SentenceTransformer based on indobenchmark/indobert-base-p2
139
+
140
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [indobenchmark/indobert-base-p2](https://huggingface.co/indobenchmark/indobert-base-p2) on the [afaji/indonli](https://huggingface.co/datasets/afaji/indonli) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
141
+
142
+ ## Model Details
143
+
144
+ ### Model Description
145
+ - **Model Type:** Sentence Transformer
146
+ - **Base model:** [indobenchmark/indobert-base-p2](https://huggingface.co/indobenchmark/indobert-base-p2) <!-- at revision 94b4e0a82081fa57f227fcc2024d1ea89b57ac1f -->
147
+ - **Maximum Sequence Length:** 512 tokens
148
+ - **Output Dimensionality:** 768 tokens
149
+ - **Similarity Function:** Cosine Similarity
150
+ - **Training Dataset:**
151
+ - [afaji/indonli](https://huggingface.co/datasets/afaji/indonli)
152
+ - **Language:** id
153
+ <!-- - **License:** Unknown -->
154
+
155
+ ### Model Sources
156
+
157
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
158
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
159
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
160
+
161
+ ### Full Model Architecture
162
+
163
+ ```
164
+ SentenceTransformer(
165
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
166
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
167
+ )
168
+ ```
169
+
170
+ ## Usage
171
+
172
+ ### Direct Usage (Sentence Transformers)
173
+
174
+ First install the Sentence Transformers library:
175
+
176
+ ```bash
177
+ pip install -U sentence-transformers
178
+ ```
179
+
180
+ Then you can load this model and run inference.
181
+ ```python
182
+ from sentence_transformers import SentenceTransformer
183
+
184
+ # Download from the 🤗 Hub
185
+ model = SentenceTransformer("cassador/indobert-base-p2-nli-v2")
186
+ # Run inference
187
+ sentences = [
188
+ 'Set album musik pengiring seri film Harry Potter akan dirilis dalam versi baru.',
189
+ 'Seri film Harry Potter memiliki set album musik pengiring.',
190
+ 'Laga dan kolosal adalah genre film.',
191
+ ]
192
+ embeddings = model.encode(sentences)
193
+ print(embeddings.shape)
194
+ # [3, 768]
195
+
196
+ # Get the similarity scores for the embeddings
197
+ similarities = model.similarity(embeddings, embeddings)
198
+ print(similarities.shape)
199
+ # [3, 3]
200
+ ```
201
+
202
+ <!--
203
+ ### Direct Usage (Transformers)
204
+
205
+ <details><summary>Click to see the direct usage in Transformers</summary>
206
+
207
+ </details>
208
+ -->
209
+
210
+ <!--
211
+ ### Downstream Usage (Sentence Transformers)
212
+
213
+ You can finetune this model on your own dataset.
214
+
215
+ <details><summary>Click to expand</summary>
216
+
217
+ </details>
218
+ -->
219
+
220
+ <!--
221
+ ### Out-of-Scope Use
222
+
223
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
224
+ -->
225
+
226
+ ## Evaluation
227
+
228
+ ### Metrics
229
+
230
+ #### Semantic Similarity
231
+ * Dataset: `sts-dev`
232
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
233
+
234
+ | Metric | Value |
235
+ |:--------------------|:----------|
236
+ | pearson_cosine | 0.3021 |
237
+ | **spearman_cosine** | **0.303** |
238
+ | pearson_manhattan | 0.2768 |
239
+ | spearman_manhattan | 0.2726 |
240
+ | pearson_euclidean | 0.3072 |
241
+ | spearman_euclidean | 0.3045 |
242
+ | pearson_dot | 0.3039 |
243
+ | spearman_dot | 0.3047 |
244
+ | pearson_max | 0.3072 |
245
+ | spearman_max | 0.3047 |
246
+
247
+ #### Semantic Similarity
248
+ * Dataset: `sts-test`
249
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
250
+
251
+ | Metric | Value |
252
+ |:--------------------|:-----------|
253
+ | pearson_cosine | 0.1038 |
254
+ | **spearman_cosine** | **0.0969** |
255
+ | pearson_manhattan | 0.0749 |
256
+ | spearman_manhattan | 0.0782 |
257
+ | pearson_euclidean | 0.0942 |
258
+ | spearman_euclidean | 0.099 |
259
+ | pearson_dot | 0.107 |
260
+ | spearman_dot | 0.0998 |
261
+ | pearson_max | 0.107 |
262
+ | spearman_max | 0.0998 |
263
+
264
+ <!--
265
+ ## Bias, Risks and Limitations
266
+
267
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
268
+ -->
269
+
270
+ <!--
271
+ ### Recommendations
272
+
273
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
274
+ -->
275
+
276
+ ## Training Details
277
+
278
+ ### Training Dataset
279
+
280
+ #### afaji/indonli
281
+
282
+ * Dataset: [afaji/indonli](https://huggingface.co/datasets/afaji/indonli)
283
+ * Size: 10,000 training samples
284
+ * Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
285
+ * Approximate statistics based on the first 1000 samples:
286
+ | | premise | hypothesis | label |
287
+ |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
288
+ | type | string | string | int |
289
+ | details | <ul><li>min: 12 tokens</li><li>mean: 29.73 tokens</li><li>max: 179 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 11.93 tokens</li><li>max: 35 tokens</li></ul> | <ul><li>0: ~68.60%</li><li>1: ~31.40%</li></ul> |
290
+ * Samples:
291
+ | premise | hypothesis | label |
292
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------|:---------------|
293
+ | <code>Presiden Joko Widodo (Jokowi) menyampaikan prediksi bahwa wabah virus Corona (COVID-19) di Indonesia akan selesai akhir tahun ini.</code> | <code>Prediksi akhir wabah tidak disampaikan Jokowi.</code> | <code>0</code> |
294
+ | <code>Meski biasanya hanya digunakan di fasilitas kesehatan, saat ini masker dan sarung tangan sekali pakai banyak dipakai di tingkat rumah tangga.</code> | <code>Masker sekali pakai banyak dipakai di tingkat rumah tangga.</code> | <code>1</code> |
295
+ | <code>Data dari Nielsen Music mencatat, "Joanne" telah terjual 201 ribu kopi di akhir minggu ini, seperti dilansir aceshowbiz.com.</code> | <code>Nielsen Music mencatat pada akhir minggu ini.</code> | <code>0</code> |
296
+ * Loss: [<code>SoftmaxLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#softmaxloss)
297
+
298
+ ### Evaluation Dataset
299
+
300
+ #### afaji/indonli
301
+
302
+ * Dataset: [afaji/indonli](https://huggingface.co/datasets/afaji/indonli)
303
+ * Size: 2,000 evaluation samples
304
+ * Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
305
+ * Approximate statistics based on the first 1000 samples:
306
+ | | premise | hypothesis | label |
307
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
308
+ | type | string | string | int |
309
+ | details | <ul><li>min: 9 tokens</li><li>mean: 28.09 tokens</li><li>max: 179 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 12.01 tokens</li><li>max: 24 tokens</li></ul> | <ul><li>0: ~63.00%</li><li>1: ~37.00%</li></ul> |
310
+ * Samples:
311
+ | premise | hypothesis | label |
312
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------|:---------------|
313
+ | <code>Manuskrip tersebut berisi tiga catatan yang menceritakan bagaimana peristiwa jatuhnya meteorit serta laporan kematian akibat kejadian tersebut seperti dilansir dari Science Alert, Sabtu (25/4/2020).</code> | <code>Manuskrip tersebut tidak mencatat laporan kematian.</code> | <code>0</code> |
314
+ | <code>Dilansir dari Business Insider, menurut observasi dari Mauna Loa Observatory di Hawaii pada karbon dioksida (CO2) di level mencapai 410 ppm tidak langsung memberikan efek pada pernapasan, karena tubuh manusia juga masih membutuhkan CO2 dalam kadar tertentu.</code> | <code>Tidak ada observasi yang pernah dilansir oleh Business Insider.</code> | <code>0</code> |
315
+ | <code>Perekonomian Jakarta terutama ditunjang oleh sektor perdagangan, jasa, properti, industri kreatif, dan keuangan.</code> | <code>Sektor jasa memberi pengaruh lebih besar daripada industri kreatif dalam perekonomian Jakarta.</code> | <code>0</code> |
316
+ * Loss: [<code>SoftmaxLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#softmaxloss)
317
+
318
+ ### Training Hyperparameters
319
+ #### Non-Default Hyperparameters
320
+
321
+ - `eval_strategy`: epoch
322
+ - `learning_rate`: 1e-05
323
+ - `num_train_epochs`: 10
324
+ - `warmup_ratio`: 0.001
325
+ - `fp16`: True
326
+
327
+ #### All Hyperparameters
328
+ <details><summary>Click to expand</summary>
329
+
330
+ - `overwrite_output_dir`: False
331
+ - `do_predict`: False
332
+ - `eval_strategy`: epoch
333
+ - `prediction_loss_only`: True
334
+ - `per_device_train_batch_size`: 8
335
+ - `per_device_eval_batch_size`: 8
336
+ - `per_gpu_train_batch_size`: None
337
+ - `per_gpu_eval_batch_size`: None
338
+ - `gradient_accumulation_steps`: 1
339
+ - `eval_accumulation_steps`: None
340
+ - `learning_rate`: 1e-05
341
+ - `weight_decay`: 0.0
342
+ - `adam_beta1`: 0.9
343
+ - `adam_beta2`: 0.999
344
+ - `adam_epsilon`: 1e-08
345
+ - `max_grad_norm`: 1.0
346
+ - `num_train_epochs`: 10
347
+ - `max_steps`: -1
348
+ - `lr_scheduler_type`: linear
349
+ - `lr_scheduler_kwargs`: {}
350
+ - `warmup_ratio`: 0.001
351
+ - `warmup_steps`: 0
352
+ - `log_level`: passive
353
+ - `log_level_replica`: warning
354
+ - `log_on_each_node`: True
355
+ - `logging_nan_inf_filter`: True
356
+ - `save_safetensors`: True
357
+ - `save_on_each_node`: False
358
+ - `save_only_model`: False
359
+ - `restore_callback_states_from_checkpoint`: False
360
+ - `no_cuda`: False
361
+ - `use_cpu`: False
362
+ - `use_mps_device`: False
363
+ - `seed`: 42
364
+ - `data_seed`: None
365
+ - `jit_mode_eval`: False
366
+ - `use_ipex`: False
367
+ - `bf16`: False
368
+ - `fp16`: True
369
+ - `fp16_opt_level`: O1
370
+ - `half_precision_backend`: auto
371
+ - `bf16_full_eval`: False
372
+ - `fp16_full_eval`: False
373
+ - `tf32`: None
374
+ - `local_rank`: 0
375
+ - `ddp_backend`: None
376
+ - `tpu_num_cores`: None
377
+ - `tpu_metrics_debug`: False
378
+ - `debug`: []
379
+ - `dataloader_drop_last`: False
380
+ - `dataloader_num_workers`: 0
381
+ - `dataloader_prefetch_factor`: None
382
+ - `past_index`: -1
383
+ - `disable_tqdm`: False
384
+ - `remove_unused_columns`: True
385
+ - `label_names`: None
386
+ - `load_best_model_at_end`: False
387
+ - `ignore_data_skip`: False
388
+ - `fsdp`: []
389
+ - `fsdp_min_num_params`: 0
390
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
391
+ - `fsdp_transformer_layer_cls_to_wrap`: None
392
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
393
+ - `deepspeed`: None
394
+ - `label_smoothing_factor`: 0.0
395
+ - `optim`: adamw_torch
396
+ - `optim_args`: None
397
+ - `adafactor`: False
398
+ - `group_by_length`: False
399
+ - `length_column_name`: length
400
+ - `ddp_find_unused_parameters`: None
401
+ - `ddp_bucket_cap_mb`: None
402
+ - `ddp_broadcast_buffers`: False
403
+ - `dataloader_pin_memory`: True
404
+ - `dataloader_persistent_workers`: False
405
+ - `skip_memory_metrics`: True
406
+ - `use_legacy_prediction_loop`: False
407
+ - `push_to_hub`: False
408
+ - `resume_from_checkpoint`: None
409
+ - `hub_model_id`: None
410
+ - `hub_strategy`: every_save
411
+ - `hub_private_repo`: False
412
+ - `hub_always_push`: False
413
+ - `gradient_checkpointing`: False
414
+ - `gradient_checkpointing_kwargs`: None
415
+ - `include_inputs_for_metrics`: False
416
+ - `eval_do_concat_batches`: True
417
+ - `fp16_backend`: auto
418
+ - `push_to_hub_model_id`: None
419
+ - `push_to_hub_organization`: None
420
+ - `mp_parameters`:
421
+ - `auto_find_batch_size`: False
422
+ - `full_determinism`: False
423
+ - `torchdynamo`: None
424
+ - `ray_scope`: last
425
+ - `ddp_timeout`: 1800
426
+ - `torch_compile`: False
427
+ - `torch_compile_backend`: None
428
+ - `torch_compile_mode`: None
429
+ - `dispatch_batches`: None
430
+ - `split_batches`: None
431
+ - `include_tokens_per_second`: False
432
+ - `include_num_input_tokens_seen`: False
433
+ - `neftune_noise_alpha`: None
434
+ - `optim_target_modules`: None
435
+ - `batch_eval_metrics`: False
436
+ - `batch_sampler`: batch_sampler
437
+ - `multi_dataset_batch_sampler`: proportional
438
+
439
+ </details>
440
+
441
+ ### Training Logs
442
+ <details><summary>Click to expand</summary>
443
+
444
+ | Epoch | Step | Training Loss | loss | sts-dev_spearman_cosine | sts-test_spearman_cosine |
445
+ |:------:|:-----:|:-------------:|:------:|:-----------------------:|:------------------------:|
446
+ | 0 | 0 | - | - | 0.1928 | - |
447
+ | 0.04 | 100 | 1.1407 | - | - | - |
448
+ | 0.08 | 200 | 0.7456 | - | - | - |
449
+ | 0.12 | 300 | 0.6991 | - | - | - |
450
+ | 0.16 | 400 | 0.6653 | - | - | - |
451
+ | 0.2 | 500 | 0.6317 | - | - | - |
452
+ | 0.24 | 600 | 0.5975 | - | - | - |
453
+ | 0.28 | 700 | 0.5955 | - | - | - |
454
+ | 0.32 | 800 | 0.6168 | - | - | - |
455
+ | 0.36 | 900 | 0.5851 | - | - | - |
456
+ | 0.4 | 1000 | 0.591 | - | - | - |
457
+ | 0.44 | 1100 | 0.6063 | - | - | - |
458
+ | 0.48 | 1200 | 0.6122 | - | - | - |
459
+ | 0.52 | 1300 | 0.5881 | - | - | - |
460
+ | 0.56 | 1400 | 0.59 | - | - | - |
461
+ | 0.6 | 1500 | 0.5715 | - | - | - |
462
+ | 0.64 | 1600 | 0.5725 | - | - | - |
463
+ | 0.68 | 1700 | 0.5771 | - | - | - |
464
+ | 0.72 | 1800 | 0.5935 | - | - | - |
465
+ | 0.76 | 1900 | 0.584 | - | - | - |
466
+ | 0.8 | 2000 | 0.5829 | - | - | - |
467
+ | 0.84 | 2100 | 0.5507 | - | - | - |
468
+ | 0.88 | 2200 | 0.5447 | - | - | - |
469
+ | 0.92 | 2300 | 0.6059 | - | - | - |
470
+ | 0.96 | 2400 | 0.5389 | - | - | - |
471
+ | 1.0 | 2500 | 0.639 | 0.5432 | 0.4007 | - |
472
+ | 1.04 | 2600 | 0.463 | - | - | - |
473
+ | 1.08 | 2700 | 0.4936 | - | - | - |
474
+ | 1.12 | 2800 | 0.4966 | - | - | - |
475
+ | 1.16 | 2900 | 0.4588 | - | - | - |
476
+ | 1.2 | 3000 | 0.5148 | - | - | - |
477
+ | 1.24 | 3100 | 0.5043 | - | - | - |
478
+ | 1.28 | 3200 | 0.5048 | - | - | - |
479
+ | 1.32 | 3300 | 0.4803 | - | - | - |
480
+ | 1.3600 | 3400 | 0.465 | - | - | - |
481
+ | 1.4 | 3500 | 0.5133 | - | - | - |
482
+ | 1.44 | 3600 | 0.5505 | - | - | - |
483
+ | 1.48 | 3700 | 0.4498 | - | - | - |
484
+ | 1.52 | 3800 | 0.5418 | - | - | - |
485
+ | 1.56 | 3900 | 0.5268 | - | - | - |
486
+ | 1.6 | 4000 | 0.4546 | - | - | - |
487
+ | 1.6400 | 4100 | 0.5279 | - | - | - |
488
+ | 1.6800 | 4200 | 0.5309 | - | - | - |
489
+ | 1.72 | 4300 | 0.487 | - | - | - |
490
+ | 1.76 | 4400 | 0.5371 | - | - | - |
491
+ | 1.8 | 4500 | 0.5097 | - | - | - |
492
+ | 1.8400 | 4600 | 0.5242 | - | - | - |
493
+ | 1.88 | 4700 | 0.4583 | - | - | - |
494
+ | 1.92 | 4800 | 0.4923 | - | - | - |
495
+ | 1.96 | 4900 | 0.5028 | - | - | - |
496
+ | 2.0 | 5000 | 0.5139 | 0.6274 | 0.4335 | - |
497
+ | 2.04 | 5100 | 0.322 | - | - | - |
498
+ | 2.08 | 5200 | 0.389 | - | - | - |
499
+ | 2.12 | 5300 | 0.3633 | - | - | - |
500
+ | 2.16 | 5400 | 0.3868 | - | - | - |
501
+ | 2.2 | 5500 | 0.3798 | - | - | - |
502
+ | 2.24 | 5600 | 0.4385 | - | - | - |
503
+ | 2.2800 | 5700 | 0.3965 | - | - | - |
504
+ | 2.32 | 5800 | 0.3895 | - | - | - |
505
+ | 2.36 | 5900 | 0.4484 | - | - | - |
506
+ | 2.4 | 6000 | 0.3452 | - | - | - |
507
+ | 2.44 | 6100 | 0.3905 | - | - | - |
508
+ | 2.48 | 6200 | 0.376 | - | - | - |
509
+ | 2.52 | 6300 | 0.4986 | - | - | - |
510
+ | 2.56 | 6400 | 0.3732 | - | - | - |
511
+ | 2.6 | 6500 | 0.3632 | - | - | - |
512
+ | 2.64 | 6600 | 0.3915 | - | - | - |
513
+ | 2.68 | 6700 | 0.4394 | - | - | - |
514
+ | 2.7200 | 6800 | 0.3852 | - | - | - |
515
+ | 2.76 | 6900 | 0.3984 | - | - | - |
516
+ | 2.8 | 7000 | 0.426 | - | - | - |
517
+ | 2.84 | 7100 | 0.3274 | - | - | - |
518
+ | 2.88 | 7200 | 0.4673 | - | - | - |
519
+ | 2.92 | 7300 | 0.4599 | - | - | - |
520
+ | 2.96 | 7400 | 0.4304 | - | - | - |
521
+ | 3.0 | 7500 | 0.4151 | 0.8967 | 0.4007 | - |
522
+ | 3.04 | 7600 | 0.2345 | - | - | - |
523
+ | 3.08 | 7700 | 0.1807 | - | - | - |
524
+ | 3.12 | 7800 | 0.2984 | - | - | - |
525
+ | 3.16 | 7900 | 0.2357 | - | - | - |
526
+ | 3.2 | 8000 | 0.4506 | - | - | - |
527
+ | 3.24 | 8100 | 0.2178 | - | - | - |
528
+ | 3.2800 | 8200 | 0.2654 | - | - | - |
529
+ | 3.32 | 8300 | 0.2863 | - | - | - |
530
+ | 3.36 | 8400 | 0.2626 | - | - | - |
531
+ | 3.4 | 8500 | 0.3281 | - | - | - |
532
+ | 3.44 | 8600 | 0.2555 | - | - | - |
533
+ | 3.48 | 8700 | 0.4245 | - | - | - |
534
+ | 3.52 | 8800 | 0.2368 | - | - | - |
535
+ | 3.56 | 8900 | 0.3288 | - | - | - |
536
+ | 3.6 | 9000 | 0.3417 | - | - | - |
537
+ | 3.64 | 9100 | 0.3249 | - | - | - |
538
+ | 3.68 | 9200 | 0.3378 | - | - | - |
539
+ | 3.7200 | 9300 | 0.233 | - | - | - |
540
+ | 3.76 | 9400 | 0.3215 | - | - | - |
541
+ | 3.8 | 9500 | 0.251 | - | - | - |
542
+ | 3.84 | 9600 | 0.3138 | - | - | - |
543
+ | 3.88 | 9700 | 0.3081 | - | - | - |
544
+ | 3.92 | 9800 | 0.3875 | - | - | - |
545
+ | 3.96 | 9900 | 0.3231 | - | - | - |
546
+ | 4.0 | 10000 | 0.2119 | 1.4983 | 0.4129 | - |
547
+ | 4.04 | 10100 | 0.1323 | - | - | - |
548
+ | 4.08 | 10200 | 0.2222 | - | - | - |
549
+ | 4.12 | 10300 | 0.2005 | - | - | - |
550
+ | 4.16 | 10400 | 0.127 | - | - | - |
551
+ | 4.2 | 10500 | 0.1052 | - | - | - |
552
+ | 4.24 | 10600 | 0.1657 | - | - | - |
553
+ | 4.28 | 10700 | 0.2305 | - | - | - |
554
+ | 4.32 | 10800 | 0.1048 | - | - | - |
555
+ | 4.36 | 10900 | 0.2081 | - | - | - |
556
+ | 4.4 | 11000 | 0.201 | - | - | - |
557
+ | 4.44 | 11100 | 0.1515 | - | - | - |
558
+ | 4.48 | 11200 | 0.2112 | - | - | - |
559
+ | 4.52 | 11300 | 0.1936 | - | - | - |
560
+ | 4.5600 | 11400 | 0.1578 | - | - | - |
561
+ | 4.6 | 11500 | 0.2551 | - | - | - |
562
+ | 4.64 | 11600 | 0.2888 | - | - | - |
563
+ | 4.68 | 11700 | 0.128 | - | - | - |
564
+ | 4.72 | 11800 | 0.2172 | - | - | - |
565
+ | 4.76 | 11900 | 0.114 | - | - | - |
566
+ | 4.8 | 12000 | 0.2135 | - | - | - |
567
+ | 4.84 | 12100 | 0.2421 | - | - | - |
568
+ | 4.88 | 12200 | 0.2392 | - | - | - |
569
+ | 4.92 | 12300 | 0.1478 | - | - | - |
570
+ | 4.96 | 12400 | 0.1901 | - | - | - |
571
+ | 5.0 | 12500 | 0.2219 | 1.9582 | 0.3469 | - |
572
+ | 5.04 | 12600 | 0.1586 | - | - | - |
573
+ | 5.08 | 12700 | 0.1587 | - | - | - |
574
+ | 5.12 | 12800 | 0.0663 | - | - | - |
575
+ | 5.16 | 12900 | 0.0703 | - | - | - |
576
+ | 5.2 | 13000 | 0.0783 | - | - | - |
577
+ | 5.24 | 13100 | 0.1143 | - | - | - |
578
+ | 5.28 | 13200 | 0.1155 | - | - | - |
579
+ | 5.32 | 13300 | 0.0661 | - | - | - |
580
+ | 5.36 | 13400 | 0.0935 | - | - | - |
581
+ | 5.4 | 13500 | 0.1344 | - | - | - |
582
+ | 5.44 | 13600 | 0.1031 | - | - | - |
583
+ | 5.48 | 13700 | 0.1294 | - | - | - |
584
+ | 5.52 | 13800 | 0.103 | - | - | - |
585
+ | 5.5600 | 13900 | 0.0739 | - | - | - |
586
+ | 5.6 | 14000 | 0.1477 | - | - | - |
587
+ | 5.64 | 14100 | 0.1171 | - | - | - |
588
+ | 5.68 | 14200 | 0.1504 | - | - | - |
589
+ | 5.72 | 14300 | 0.1122 | - | - | - |
590
+ | 5.76 | 14400 | 0.1279 | - | - | - |
591
+ | 5.8 | 14500 | 0.0813 | - | - | - |
592
+ | 5.84 | 14600 | 0.1372 | - | - | - |
593
+ | 5.88 | 14700 | 0.1615 | - | - | - |
594
+ | 5.92 | 14800 | 0.1944 | - | - | - |
595
+ | 5.96 | 14900 | 0.0436 | - | - | - |
596
+ | 6.0 | 15000 | 0.1195 | 2.2220 | 0.3559 | - |
597
+ | 0.08 | 100 | 0.0844 | - | - | - |
598
+ | 0.16 | 200 | 0.1357 | - | - | - |
599
+ | 0.24 | 300 | 0.1382 | - | - | - |
600
+ | 0.32 | 400 | 0.2091 | - | - | - |
601
+ | 0.4 | 500 | 0.2351 | - | - | - |
602
+ | 0.48 | 600 | 0.2976 | - | - | - |
603
+ | 0.56 | 700 | 0.3408 | - | - | - |
604
+ | 0.64 | 800 | 0.2656 | - | - | - |
605
+ | 0.72 | 900 | 0.3183 | - | - | - |
606
+ | 0.8 | 1000 | 0.2513 | - | - | - |
607
+ | 0.88 | 1100 | 0.2293 | - | - | - |
608
+ | 0.96 | 1200 | 0.3241 | - | - | - |
609
+ | 1.0 | 1250 | - | 1.1813 | 0.3495 | - |
610
+ | 0.3195 | 100 | 0.6132 | - | - | - |
611
+ | 0.6390 | 200 | 0.1554 | - | - | - |
612
+ | 0.9585 | 300 | 0.1366 | - | - | - |
613
+ | 1.0 | 313 | - | 1.2867 | 0.3839 | - |
614
+ | 0.08 | 100 | 0.2713 | - | - | - |
615
+ | 0.16 | 200 | 0.1273 | - | - | - |
616
+ | 0.24 | 300 | 0.0883 | - | - | - |
617
+ | 0.32 | 400 | 0.0749 | - | - | - |
618
+ | 0.08 | 100 | 0.0653 | - | - | - |
619
+ | 0.16 | 200 | 0.0311 | - | - | - |
620
+ | 0.24 | 300 | 0.0368 | - | - | - |
621
+ | 0.32 | 400 | 0.0259 | - | - | - |
622
+ | 0.4 | 500 | 0.059 | - | - | - |
623
+ | 0.48 | 600 | 0.046 | - | - | - |
624
+ | 0.56 | 700 | 0.1266 | - | - | - |
625
+ | 0.64 | 800 | 0.0661 | - | - | - |
626
+ | 0.72 | 900 | 0.0676 | - | - | - |
627
+ | 0.8 | 1000 | 0.0759 | - | - | - |
628
+ | 0.88 | 1100 | 0.0527 | - | - | - |
629
+ | 0.96 | 1200 | 0.1038 | - | - | - |
630
+ | 1.0 | 1250 | - | 2.2411 | 0.3892 | - |
631
+ | 1.04 | 1300 | 0.0456 | - | - | - |
632
+ | 1.12 | 1400 | 0.1363 | - | - | - |
633
+ | 1.2 | 1500 | 0.1398 | - | - | - |
634
+ | 1.28 | 1600 | 0.1237 | - | - | - |
635
+ | 1.3600 | 1700 | 0.123 | - | - | - |
636
+ | 1.44 | 1800 | 0.1893 | - | - | - |
637
+ | 1.52 | 1900 | 0.1192 | - | - | - |
638
+ | 1.6 | 2000 | 0.1347 | - | - | - |
639
+ | 1.6800 | 2100 | 0.0937 | - | - | - |
640
+ | 1.76 | 2200 | 0.1506 | - | - | - |
641
+ | 1.8400 | 2300 | 0.1366 | - | - | - |
642
+ | 1.92 | 2400 | 0.1194 | - | - | - |
643
+ | 2.0 | 2500 | 0.1485 | 2.1340 | 0.3245 | - |
644
+ | 2.08 | 2600 | 0.0485 | - | - | - |
645
+ | 2.16 | 2700 | 0.0579 | - | - | - |
646
+ | 2.24 | 2800 | 0.0932 | - | - | - |
647
+ | 2.32 | 2900 | 0.0743 | - | - | - |
648
+ | 2.4 | 3000 | 0.0783 | - | - | - |
649
+ | 2.48 | 3100 | 0.0918 | - | - | - |
650
+ | 2.56 | 3200 | 0.0973 | - | - | - |
651
+ | 2.64 | 3300 | 0.0623 | - | - | - |
652
+ | 2.7200 | 3400 | 0.1284 | - | - | - |
653
+ | 2.8 | 3500 | 0.1247 | - | - | - |
654
+ | 2.88 | 3600 | 0.0648 | - | - | - |
655
+ | 2.96 | 3700 | 0.0921 | - | - | - |
656
+ | 3.0 | 3750 | - | 2.4354 | 0.2824 | - |
657
+ | 3.04 | 3800 | 0.04 | - | - | - |
658
+ | 3.12 | 3900 | 0.0417 | - | - | - |
659
+ | 3.2 | 4000 | 0.0414 | - | - | - |
660
+ | 3.2800 | 4100 | 0.0485 | - | - | - |
661
+ | 3.36 | 4200 | 0.0255 | - | - | - |
662
+ | 3.44 | 4300 | 0.0688 | - | - | - |
663
+ | 3.52 | 4400 | 0.0574 | - | - | - |
664
+ | 3.6 | 4500 | 0.0766 | - | - | - |
665
+ | 3.68 | 4600 | 0.0481 | - | - | - |
666
+ | 3.76 | 4700 | 0.06 | - | - | - |
667
+ | 3.84 | 4800 | 0.0528 | - | - | - |
668
+ | 3.92 | 4900 | 0.0426 | - | - | - |
669
+ | 4.0 | 5000 | 0.092 | 2.5427 | 0.3284 | - |
670
+ | 4.08 | 5100 | 0.0349 | - | - | - |
671
+ | 4.16 | 5200 | 0.0107 | - | - | - |
672
+ | 4.24 | 5300 | 0.0608 | - | - | - |
673
+ | 4.32 | 5400 | 0.0473 | - | - | - |
674
+ | 4.4 | 5500 | 0.0452 | - | - | - |
675
+ | 4.48 | 5600 | 0.0316 | - | - | - |
676
+ | 4.5600 | 5700 | 0.0096 | - | - | - |
677
+ | 4.64 | 5800 | 0.0511 | - | - | - |
678
+ | 4.72 | 5900 | 0.0207 | - | - | - |
679
+ | 4.8 | 6000 | 0.0061 | - | - | - |
680
+ | 4.88 | 6100 | 0.0381 | - | - | - |
681
+ | 4.96 | 6200 | 0.0378 | - | - | - |
682
+ | 5.0 | 6250 | - | 2.6061 | 0.3061 | - |
683
+ | 5.04 | 6300 | 0.0326 | - | - | - |
684
+ | 5.12 | 6400 | 0.0349 | - | - | - |
685
+ | 5.2 | 6500 | 0.0128 | - | - | - |
686
+ | 5.28 | 6600 | 0.0185 | - | - | - |
687
+ | 5.36 | 6700 | 0.0145 | - | - | - |
688
+ | 5.44 | 6800 | 0.0521 | - | - | - |
689
+ | 5.52 | 6900 | 0.0427 | - | - | - |
690
+ | 5.6 | 7000 | 0.0215 | - | - | - |
691
+ | 5.68 | 7100 | 0.0195 | - | - | - |
692
+ | 5.76 | 7200 | 0.0426 | - | - | - |
693
+ | 5.84 | 7300 | 0.057 | - | - | - |
694
+ | 5.92 | 7400 | 0.0106 | - | - | - |
695
+ | 6.0 | 7500 | 0.0284 | 2.8348 | 0.3291 | - |
696
+ | 6.08 | 7600 | 0.0286 | - | - | - |
697
+ | 6.16 | 7700 | 0.018 | - | - | - |
698
+ | 6.24 | 7800 | 0.0224 | - | - | - |
699
+ | 6.32 | 7900 | 0.0102 | - | - | - |
700
+ | 6.4 | 8000 | 0.0287 | - | - | - |
701
+ | 6.48 | 8100 | 0.0078 | - | - | - |
702
+ | 6.5600 | 8200 | 0.0237 | - | - | - |
703
+ | 6.64 | 8300 | 0.0148 | - | - | - |
704
+ | 6.72 | 8400 | 0.0271 | - | - | - |
705
+ | 6.8 | 8500 | 0.015 | - | - | - |
706
+ | 6.88 | 8600 | 0.0278 | - | - | - |
707
+ | 6.96 | 8700 | 0.0237 | - | - | - |
708
+ | 7.0 | 8750 | - | 2.8785 | 0.3188 | - |
709
+ | 7.04 | 8800 | 0.0203 | - | - | - |
710
+ | 7.12 | 8900 | 0.0089 | - | - | - |
711
+ | 7.2 | 9000 | 0.0121 | - | - | - |
712
+ | 7.28 | 9100 | 0.0185 | - | - | - |
713
+ | 7.36 | 9200 | 0.0127 | - | - | - |
714
+ | 7.44 | 9300 | 0.017 | - | - | - |
715
+ | 7.52 | 9400 | 0.0117 | - | - | - |
716
+ | 7.6 | 9500 | 0.006 | - | - | - |
717
+ | 7.68 | 9600 | 0.0061 | - | - | - |
718
+ | 7.76 | 9700 | 0.0141 | - | - | - |
719
+ | 7.84 | 9800 | 0.0091 | - | - | - |
720
+ | 7.92 | 9900 | 0.0164 | - | - | - |
721
+ | 8.0 | 10000 | 0.0244 | 2.8054 | 0.3040 | - |
722
+ | 8.08 | 10100 | 0.0001 | - | - | - |
723
+ | 8.16 | 10200 | 0.0187 | - | - | - |
724
+ | 8.24 | 10300 | 0.0098 | - | - | - |
725
+ | 8.32 | 10400 | 0.0114 | - | - | - |
726
+ | 8.4 | 10500 | 0.004 | - | - | - |
727
+ | 8.48 | 10600 | 0.0017 | - | - | - |
728
+ | 8.56 | 10700 | 0.0018 | - | - | - |
729
+ | 8.64 | 10800 | 0.009 | - | - | - |
730
+ | 8.72 | 10900 | 0.0047 | - | - | - |
731
+ | 8.8 | 11000 | 0.0014 | - | - | - |
732
+ | 8.88 | 11100 | 0.0049 | - | - | - |
733
+ | 8.96 | 11200 | 0.006 | - | - | - |
734
+ | 9.0 | 11250 | - | 2.9460 | 0.2967 | - |
735
+ | 9.04 | 11300 | 0.0057 | - | - | - |
736
+ | 9.12 | 11400 | 0.0051 | - | - | - |
737
+ | 9.2 | 11500 | 0.0067 | - | - | - |
738
+ | 9.28 | 11600 | 0.0009 | - | - | - |
739
+ | 9.36 | 11700 | 0.0046 | - | - | - |
740
+ | 9.44 | 11800 | 0.0138 | - | - | - |
741
+ | 9.52 | 11900 | 0.0067 | - | - | - |
742
+ | 9.6 | 12000 | 0.0043 | - | - | - |
743
+ | 9.68 | 12100 | 0.001 | - | - | - |
744
+ | 9.76 | 12200 | 0.0004 | - | - | - |
745
+ | 9.84 | 12300 | 0.0044 | - | - | - |
746
+ | 9.92 | 12400 | 0.003 | - | - | - |
747
+ | 10.0 | 12500 | 0.0055 | 2.9714 | 0.3030 | 0.0969 |
748
+
749
+ </details>
750
+
751
+ ### Framework Versions
752
+ - Python: 3.10.12
753
+ - Sentence Transformers: 3.0.1
754
+ - Transformers: 4.41.2
755
+ - PyTorch: 2.3.0+cu121
756
+ - Accelerate: 0.31.0
757
+ - Datasets: 2.20.0
758
+ - Tokenizers: 0.19.1
759
+
760
+ ## Citation
761
+
762
+ ### BibTeX
763
+
764
+ #### Sentence Transformers and SoftmaxLoss
765
+ ```bibtex
766
+ @inproceedings{reimers-2019-sentence-bert,
767
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
768
+ author = "Reimers, Nils and Gurevych, Iryna",
769
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
770
+ month = "11",
771
+ year = "2019",
772
+ publisher = "Association for Computational Linguistics",
773
+ url = "https://arxiv.org/abs/1908.10084",
774
+ }
775
+ ```
776
+
777
+ <!--
778
+ ## Glossary
779
+
780
+ *Clearly define terms in order to be accessible across audiences.*
781
+ -->
782
+
783
+ <!--
784
+ ## Model Card Authors
785
+
786
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
787
+ -->
788
+
789
+ <!--
790
+ ## Model Card Contact
791
+
792
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
793
+ -->
config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "indobenchmark/indobert-base-p2",
3
+ "_num_labels": 5,
4
+ "architectures": [
5
+ "BertModel"
6
+ ],
7
+ "attention_probs_dropout_prob": 0.1,
8
+ "classifier_dropout": null,
9
+ "directionality": "bidi",
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "id2label": {
14
+ "0": "LABEL_0",
15
+ "1": "LABEL_1",
16
+ "2": "LABEL_2",
17
+ "3": "LABEL_3",
18
+ "4": "LABEL_4"
19
+ },
20
+ "initializer_range": 0.02,
21
+ "intermediate_size": 3072,
22
+ "label2id": {
23
+ "LABEL_0": 0,
24
+ "LABEL_1": 1,
25
+ "LABEL_2": 2,
26
+ "LABEL_3": 3,
27
+ "LABEL_4": 4
28
+ },
29
+ "layer_norm_eps": 1e-12,
30
+ "max_position_embeddings": 512,
31
+ "model_type": "bert",
32
+ "num_attention_heads": 12,
33
+ "num_hidden_layers": 12,
34
+ "output_past": true,
35
+ "pad_token_id": 0,
36
+ "pooler_fc_size": 768,
37
+ "pooler_num_attention_heads": 12,
38
+ "pooler_num_fc_layers": 3,
39
+ "pooler_size_per_head": 128,
40
+ "pooler_type": "first_token_transform",
41
+ "position_embedding_type": "absolute",
42
+ "torch_dtype": "float32",
43
+ "transformers_version": "4.41.2",
44
+ "type_vocab_size": 2,
45
+ "use_cache": true,
46
+ "vocab_size": 50000
47
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.3.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2cc3c7ac40603a75fb27bcfd87ece8d8ec7f611341c69c226a9164dcc7f4881c
3
+ size 497787752
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 1000000000000000019884624838656,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff