cassador commited on
Commit
14e8d5c
1 Parent(s): 335764e

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,542 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - id
4
+ library_name: sentence-transformers
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:10000
11
+ - loss:SoftmaxLoss
12
+ base_model: indobenchmark/indobert-base-p2
13
+ datasets:
14
+ - afaji/indonli
15
+ metrics:
16
+ - pearson_cosine
17
+ - spearman_cosine
18
+ - pearson_manhattan
19
+ - spearman_manhattan
20
+ - pearson_euclidean
21
+ - spearman_euclidean
22
+ - pearson_dot
23
+ - spearman_dot
24
+ - pearson_max
25
+ - spearman_max
26
+ widget:
27
+ - source_sentence: Dengan meniupnya, perawat bisa segera mengerti bahwa ia dipanggil
28
+ dan akan segera datang menolong.
29
+ sentences:
30
+ - 38% pemilih tidak mendukung meninggalkan Uni Eropa.
31
+ - Perawat mengerti bahwa ia dipanggil dan akan segera datang menolong.
32
+ - Dari fakta-fakta tersebut dapat diindikasikan pembakaran gereja dilakukan secara
33
+ sengaja.
34
+ - source_sentence: Kebudayaan jawa lainnya adalah Sintren, Sintren adalan kesenian
35
+ tradisional masyarakat Jawa, khususnya Pekalongan.
36
+ sentences:
37
+ - Sintren merupakan kesenian tradisional masyarakat Jawa yang ada sejak zaman kerajaan.
38
+ - Klinik ini melarang pasiennya menghisap ganja.
39
+ - Perubahan dunia saat itu dipengaruhi oleh Krisis Suez.
40
+ - source_sentence: Saat ini, sudah empat wanita yang mengaku dilecehkan. Yang terakhir
41
+ ialah aktris Rose McGowan, dengan tuntutan pemerkosaan.
42
+ sentences:
43
+ - Di Maroko Tenggara tidak pernah ada fosil vertebrata.
44
+ - Tidak ada yang dilecehkan.
45
+ - Ganja tidak boleh diberikan kepada pasien penyakit apapun.
46
+ - source_sentence: Peperangan di tanah berubah dari lini depan statis Perang Dunia
47
+ I menjadi peningkatan mobilitas dan persenjataan gabungan.
48
+ sentences:
49
+ - Peperangan di tanah awalnya berbentuk lini depan statis Perang Dunia I.
50
+ - Ia berdarah keturunan India.
51
+ - Kesultanan Yogyakarta berasal dari Kerajaan Mataram.
52
+ - source_sentence: Bahan dasar Dalgona Coffee hanya tiga jenis yaitu bubuk kopi, gula,
53
+ dan air. Banyak resep beredar dengan komposisi dua sendok bubuk kopi, dua sendok
54
+ gula, dan dua sendok air panas.
55
+ sentences:
56
+ - Semua orang di dunia menyukai air putih.
57
+ - Jutting berada di Pengadilan Tinggi Hongkong 5 tahun kemudian.
58
+ - Resep komposisi Dalgona Coffee adalah 2 sendok bubuk kopi.
59
+ pipeline_tag: sentence-similarity
60
+ model-index:
61
+ - name: SentenceTransformer based on indobenchmark/indobert-base-p2
62
+ results:
63
+ - task:
64
+ type: semantic-similarity
65
+ name: Semantic Similarity
66
+ dataset:
67
+ name: sts dev
68
+ type: sts-dev
69
+ metrics:
70
+ - type: pearson_cosine
71
+ value: -0.4766226820019628
72
+ name: Pearson Cosine
73
+ - type: spearman_cosine
74
+ value: -0.4665046363205431
75
+ name: Spearman Cosine
76
+ - type: pearson_manhattan
77
+ value: -0.46278474137062864
78
+ name: Pearson Manhattan
79
+ - type: spearman_manhattan
80
+ value: -0.46103038796182516
81
+ name: Spearman Manhattan
82
+ - type: pearson_euclidean
83
+ value: -0.4732431317820645
84
+ name: Pearson Euclidean
85
+ - type: spearman_euclidean
86
+ value: -0.4673139200425683
87
+ name: Spearman Euclidean
88
+ - type: pearson_dot
89
+ value: -0.4679129419420587
90
+ name: Pearson Dot
91
+ - type: spearman_dot
92
+ value: -0.4577457216480116
93
+ name: Spearman Dot
94
+ - type: pearson_max
95
+ value: -0.46278474137062864
96
+ name: Pearson Max
97
+ - type: spearman_max
98
+ value: -0.4577457216480116
99
+ name: Spearman Max
100
+ - task:
101
+ type: semantic-similarity
102
+ name: Semantic Similarity
103
+ dataset:
104
+ name: sts test
105
+ type: sts-test
106
+ metrics:
107
+ - type: pearson_cosine
108
+ value: -0.20358655624514646
109
+ name: Pearson Cosine
110
+ - type: spearman_cosine
111
+ value: -0.20098073423584242
112
+ name: Spearman Cosine
113
+ - type: pearson_manhattan
114
+ value: -0.16857445418120778
115
+ name: Pearson Manhattan
116
+ - type: spearman_manhattan
117
+ value: -0.18417229002858432
118
+ name: Spearman Manhattan
119
+ - type: pearson_euclidean
120
+ value: -0.17954736289799147
121
+ name: Pearson Euclidean
122
+ - type: spearman_euclidean
123
+ value: -0.1907831094006202
124
+ name: Spearman Euclidean
125
+ - type: pearson_dot
126
+ value: -0.2158654981443921
127
+ name: Pearson Dot
128
+ - type: spearman_dot
129
+ value: -0.2141585054513143
130
+ name: Spearman Dot
131
+ - type: pearson_max
132
+ value: -0.16857445418120778
133
+ name: Pearson Max
134
+ - type: spearman_max
135
+ value: -0.18417229002858432
136
+ name: Spearman Max
137
+ ---
138
+
139
+ # SentenceTransformer based on indobenchmark/indobert-base-p2
140
+
141
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [indobenchmark/indobert-base-p2](https://huggingface.co/indobenchmark/indobert-base-p2) on the [afaji/indonli](https://huggingface.co/datasets/afaji/indonli) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
142
+
143
+ ## Model Details
144
+
145
+ ### Model Description
146
+ - **Model Type:** Sentence Transformer
147
+ - **Base model:** [indobenchmark/indobert-base-p2](https://huggingface.co/indobenchmark/indobert-base-p2) <!-- at revision 94b4e0a82081fa57f227fcc2024d1ea89b57ac1f -->
148
+ - **Maximum Sequence Length:** 512 tokens
149
+ - **Output Dimensionality:** 768 tokens
150
+ - **Similarity Function:** Cosine Similarity
151
+ - **Training Dataset:**
152
+ - [afaji/indonli](https://huggingface.co/datasets/afaji/indonli)
153
+ - **Language:** id
154
+ <!-- - **License:** Unknown -->
155
+
156
+ ### Model Sources
157
+
158
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
159
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
160
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
161
+
162
+ ### Full Model Architecture
163
+
164
+ ```
165
+ SentenceTransformer(
166
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
167
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
168
+ )
169
+ ```
170
+
171
+ ## Usage
172
+
173
+ ### Direct Usage (Sentence Transformers)
174
+
175
+ First install the Sentence Transformers library:
176
+
177
+ ```bash
178
+ pip install -U sentence-transformers
179
+ ```
180
+
181
+ Then you can load this model and run inference.
182
+ ```python
183
+ from sentence_transformers import SentenceTransformer
184
+
185
+ # Download from the 🤗 Hub
186
+ model = SentenceTransformer("cassador/indobert-base-p2-nli-v1")
187
+ # Run inference
188
+ sentences = [
189
+ 'Bahan dasar Dalgona Coffee hanya tiga jenis yaitu bubuk kopi, gula, dan air. Banyak resep beredar dengan komposisi dua sendok bubuk kopi, dua sendok gula, dan dua sendok air panas.',
190
+ 'Resep komposisi Dalgona Coffee adalah 2 sendok bubuk kopi.',
191
+ 'Jutting berada di Pengadilan Tinggi Hongkong 5 tahun kemudian.',
192
+ ]
193
+ embeddings = model.encode(sentences)
194
+ print(embeddings.shape)
195
+ # [3, 768]
196
+
197
+ # Get the similarity scores for the embeddings
198
+ similarities = model.similarity(embeddings, embeddings)
199
+ print(similarities.shape)
200
+ # [3, 3]
201
+ ```
202
+
203
+ <!--
204
+ ### Direct Usage (Transformers)
205
+
206
+ <details><summary>Click to see the direct usage in Transformers</summary>
207
+
208
+ </details>
209
+ -->
210
+
211
+ <!--
212
+ ### Downstream Usage (Sentence Transformers)
213
+
214
+ You can finetune this model on your own dataset.
215
+
216
+ <details><summary>Click to expand</summary>
217
+
218
+ </details>
219
+ -->
220
+
221
+ <!--
222
+ ### Out-of-Scope Use
223
+
224
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
225
+ -->
226
+
227
+ ## Evaluation
228
+
229
+ ### Metrics
230
+
231
+ #### Semantic Similarity
232
+ * Dataset: `sts-dev`
233
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
234
+
235
+ | Metric | Value |
236
+ |:--------------------|:------------|
237
+ | pearson_cosine | -0.4766 |
238
+ | **spearman_cosine** | **-0.4665** |
239
+ | pearson_manhattan | -0.4628 |
240
+ | spearman_manhattan | -0.461 |
241
+ | pearson_euclidean | -0.4732 |
242
+ | spearman_euclidean | -0.4673 |
243
+ | pearson_dot | -0.4679 |
244
+ | spearman_dot | -0.4577 |
245
+ | pearson_max | -0.4628 |
246
+ | spearman_max | -0.4577 |
247
+
248
+ #### Semantic Similarity
249
+ * Dataset: `sts-test`
250
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
251
+
252
+ | Metric | Value |
253
+ |:--------------------|:-----------|
254
+ | pearson_cosine | -0.2036 |
255
+ | **spearman_cosine** | **-0.201** |
256
+ | pearson_manhattan | -0.1686 |
257
+ | spearman_manhattan | -0.1842 |
258
+ | pearson_euclidean | -0.1795 |
259
+ | spearman_euclidean | -0.1908 |
260
+ | pearson_dot | -0.2159 |
261
+ | spearman_dot | -0.2142 |
262
+ | pearson_max | -0.1686 |
263
+ | spearman_max | -0.1842 |
264
+
265
+ <!--
266
+ ## Bias, Risks and Limitations
267
+
268
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
269
+ -->
270
+
271
+ <!--
272
+ ### Recommendations
273
+
274
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
275
+ -->
276
+
277
+ ## Training Details
278
+
279
+ ### Training Dataset
280
+
281
+ #### afaji/indonli
282
+
283
+ * Dataset: [afaji/indonli](https://huggingface.co/datasets/afaji/indonli)
284
+ * Size: 10,000 training samples
285
+ * Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
286
+ * Approximate statistics based on the first 1000 samples:
287
+ | | premise | hypothesis | label |
288
+ |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------|
289
+ | type | string | string | int |
290
+ | details | <ul><li>min: 12 tokens</li><li>mean: 29.73 tokens</li><li>max: 179 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 11.93 tokens</li><li>max: 35 tokens</li></ul> | <ul><li>0: ~31.40%</li><li>1: ~34.60%</li><li>2: ~34.00%</li></ul> |
291
+ * Samples:
292
+ | premise | hypothesis | label |
293
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------|:---------------|
294
+ | <code>Presiden Joko Widodo (Jokowi) menyampaikan prediksi bahwa wabah virus Corona (COVID-19) di Indonesia akan selesai akhir tahun ini.</code> | <code>Prediksi akhir wabah tidak disampaikan Jokowi.</code> | <code>2</code> |
295
+ | <code>Meski biasanya hanya digunakan di fasilitas kesehatan, saat ini masker dan sarung tangan sekali pakai banyak dipakai di tingkat rumah tangga.</code> | <code>Masker sekali pakai banyak dipakai di tingkat rumah tangga.</code> | <code>0</code> |
296
+ | <code>Data dari Nielsen Music mencatat, "Joanne" telah terjual 201 ribu kopi di akhir minggu ini, seperti dilansir aceshowbiz.com.</code> | <code>Nielsen Music mencatat pada akhir minggu ini.</code> | <code>1</code> |
297
+ * Loss: [<code>SoftmaxLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#softmaxloss)
298
+
299
+ ### Evaluation Dataset
300
+
301
+ #### afaji/indonli
302
+
303
+ * Dataset: [afaji/indonli](https://huggingface.co/datasets/afaji/indonli)
304
+ * Size: 1,000 evaluation samples
305
+ * Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
306
+ * Approximate statistics based on the first 1000 samples:
307
+ | | premise | hypothesis | label |
308
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------|
309
+ | type | string | string | int |
310
+ | details | <ul><li>min: 9 tokens</li><li>mean: 28.09 tokens</li><li>max: 179 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 12.01 tokens</li><li>max: 24 tokens</li></ul> | <ul><li>0: ~37.00%</li><li>1: ~29.20%</li><li>2: ~33.80%</li></ul> |
311
+ * Samples:
312
+ | premise | hypothesis | label |
313
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------|:---------------|
314
+ | <code>Manuskrip tersebut berisi tiga catatan yang menceritakan bagaimana peristiwa jatuhnya meteorit serta laporan kematian akibat kejadian tersebut seperti dilansir dari Science Alert, Sabtu (25/4/2020).</code> | <code>Manuskrip tersebut tidak mencatat laporan kematian.</code> | <code>2</code> |
315
+ | <code>Dilansir dari Business Insider, menurut observasi dari Mauna Loa Observatory di Hawaii pada karbon dioksida (CO2) di level mencapai 410 ppm tidak langsung memberikan efek pada pernapasan, karena tubuh manusia juga masih membutuhkan CO2 dalam kadar tertentu.</code> | <code>Tidak ada observasi yang pernah dilansir oleh Business Insider.</code> | <code>2</code> |
316
+ | <code>Perekonomian Jakarta terutama ditunjang oleh sektor perdagangan, jasa, properti, industri kreatif, dan keuangan.</code> | <code>Sektor jasa memberi pengaruh lebih besar daripada industri kreatif dalam perekonomian Jakarta.</code> | <code>1</code> |
317
+ * Loss: [<code>SoftmaxLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#softmaxloss)
318
+
319
+ ### Training Hyperparameters
320
+ #### Non-Default Hyperparameters
321
+
322
+ - `eval_strategy`: epoch
323
+ - `learning_rate`: 2e-05
324
+ - `num_train_epochs`: 4
325
+ - `warmup_ratio`: 0.1
326
+ - `fp16`: True
327
+
328
+ #### All Hyperparameters
329
+ <details><summary>Click to expand</summary>
330
+
331
+ - `overwrite_output_dir`: False
332
+ - `do_predict`: False
333
+ - `eval_strategy`: epoch
334
+ - `prediction_loss_only`: True
335
+ - `per_device_train_batch_size`: 8
336
+ - `per_device_eval_batch_size`: 8
337
+ - `per_gpu_train_batch_size`: None
338
+ - `per_gpu_eval_batch_size`: None
339
+ - `gradient_accumulation_steps`: 1
340
+ - `eval_accumulation_steps`: None
341
+ - `learning_rate`: 2e-05
342
+ - `weight_decay`: 0.0
343
+ - `adam_beta1`: 0.9
344
+ - `adam_beta2`: 0.999
345
+ - `adam_epsilon`: 1e-08
346
+ - `max_grad_norm`: 1.0
347
+ - `num_train_epochs`: 4
348
+ - `max_steps`: -1
349
+ - `lr_scheduler_type`: linear
350
+ - `lr_scheduler_kwargs`: {}
351
+ - `warmup_ratio`: 0.1
352
+ - `warmup_steps`: 0
353
+ - `log_level`: passive
354
+ - `log_level_replica`: warning
355
+ - `log_on_each_node`: True
356
+ - `logging_nan_inf_filter`: True
357
+ - `save_safetensors`: True
358
+ - `save_on_each_node`: False
359
+ - `save_only_model`: False
360
+ - `restore_callback_states_from_checkpoint`: False
361
+ - `no_cuda`: False
362
+ - `use_cpu`: False
363
+ - `use_mps_device`: False
364
+ - `seed`: 42
365
+ - `data_seed`: None
366
+ - `jit_mode_eval`: False
367
+ - `use_ipex`: False
368
+ - `bf16`: False
369
+ - `fp16`: True
370
+ - `fp16_opt_level`: O1
371
+ - `half_precision_backend`: auto
372
+ - `bf16_full_eval`: False
373
+ - `fp16_full_eval`: False
374
+ - `tf32`: None
375
+ - `local_rank`: 0
376
+ - `ddp_backend`: None
377
+ - `tpu_num_cores`: None
378
+ - `tpu_metrics_debug`: False
379
+ - `debug`: []
380
+ - `dataloader_drop_last`: False
381
+ - `dataloader_num_workers`: 0
382
+ - `dataloader_prefetch_factor`: None
383
+ - `past_index`: -1
384
+ - `disable_tqdm`: False
385
+ - `remove_unused_columns`: True
386
+ - `label_names`: None
387
+ - `load_best_model_at_end`: False
388
+ - `ignore_data_skip`: False
389
+ - `fsdp`: []
390
+ - `fsdp_min_num_params`: 0
391
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
392
+ - `fsdp_transformer_layer_cls_to_wrap`: None
393
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
394
+ - `deepspeed`: None
395
+ - `label_smoothing_factor`: 0.0
396
+ - `optim`: adamw_torch
397
+ - `optim_args`: None
398
+ - `adafactor`: False
399
+ - `group_by_length`: False
400
+ - `length_column_name`: length
401
+ - `ddp_find_unused_parameters`: None
402
+ - `ddp_bucket_cap_mb`: None
403
+ - `ddp_broadcast_buffers`: False
404
+ - `dataloader_pin_memory`: True
405
+ - `dataloader_persistent_workers`: False
406
+ - `skip_memory_metrics`: True
407
+ - `use_legacy_prediction_loop`: False
408
+ - `push_to_hub`: False
409
+ - `resume_from_checkpoint`: None
410
+ - `hub_model_id`: None
411
+ - `hub_strategy`: every_save
412
+ - `hub_private_repo`: False
413
+ - `hub_always_push`: False
414
+ - `gradient_checkpointing`: False
415
+ - `gradient_checkpointing_kwargs`: None
416
+ - `include_inputs_for_metrics`: False
417
+ - `eval_do_concat_batches`: True
418
+ - `fp16_backend`: auto
419
+ - `push_to_hub_model_id`: None
420
+ - `push_to_hub_organization`: None
421
+ - `mp_parameters`:
422
+ - `auto_find_batch_size`: False
423
+ - `full_determinism`: False
424
+ - `torchdynamo`: None
425
+ - `ray_scope`: last
426
+ - `ddp_timeout`: 1800
427
+ - `torch_compile`: False
428
+ - `torch_compile_backend`: None
429
+ - `torch_compile_mode`: None
430
+ - `dispatch_batches`: None
431
+ - `split_batches`: None
432
+ - `include_tokens_per_second`: False
433
+ - `include_num_input_tokens_seen`: False
434
+ - `neftune_noise_alpha`: None
435
+ - `optim_target_modules`: None
436
+ - `batch_eval_metrics`: False
437
+ - `batch_sampler`: batch_sampler
438
+ - `multi_dataset_batch_sampler`: proportional
439
+
440
+ </details>
441
+
442
+ ### Training Logs
443
+ | Epoch | Step | Training Loss | loss | sts-dev_spearman_cosine | sts-test_spearman_cosine |
444
+ |:------:|:----:|:-------------:|:------:|:-----------------------:|:------------------------:|
445
+ | 0 | 0 | - | - | -0.0893 | - |
446
+ | 0.08 | 100 | 1.0851 | - | - | - |
447
+ | 0.16 | 200 | 1.0163 | - | - | - |
448
+ | 0.24 | 300 | 0.9524 | - | - | - |
449
+ | 0.32 | 400 | 0.9257 | - | - | - |
450
+ | 0.4 | 500 | 0.9397 | - | - | - |
451
+ | 0.48 | 600 | 0.9125 | - | - | - |
452
+ | 0.56 | 700 | 0.913 | - | - | - |
453
+ | 0.64 | 800 | 0.8792 | - | - | - |
454
+ | 0.72 | 900 | 0.932 | - | - | - |
455
+ | 0.8 | 1000 | 0.9112 | - | - | - |
456
+ | 0.88 | 1100 | 0.8809 | - | - | - |
457
+ | 0.96 | 1200 | 0.8567 | - | - | - |
458
+ | 1.0 | 1250 | - | 0.8585 | -0.4868 | - |
459
+ | 1.04 | 1300 | 0.8482 | - | - | - |
460
+ | 1.12 | 1400 | 0.7235 | - | - | - |
461
+ | 1.2 | 1500 | 0.714 | - | - | - |
462
+ | 1.28 | 1600 | 0.7053 | - | - | - |
463
+ | 1.3600 | 1700 | 0.7205 | - | - | - |
464
+ | 1.44 | 1800 | 0.7203 | - | - | - |
465
+ | 1.52 | 1900 | 0.6957 | - | - | - |
466
+ | 1.6 | 2000 | 0.7271 | - | - | - |
467
+ | 1.6800 | 2100 | 0.7302 | - | - | - |
468
+ | 1.76 | 2200 | 0.7054 | - | - | - |
469
+ | 1.8400 | 2300 | 0.7134 | - | - | - |
470
+ | 1.92 | 2400 | 0.6919 | - | - | - |
471
+ | 2.0 | 2500 | 0.7416 | 0.8465 | -0.4085 | - |
472
+ | 2.08 | 2600 | 0.4955 | - | - | - |
473
+ | 2.16 | 2700 | 0.4484 | - | - | - |
474
+ | 2.24 | 2800 | 0.4413 | - | - | - |
475
+ | 2.32 | 2900 | 0.4567 | - | - | - |
476
+ | 2.4 | 3000 | 0.4889 | - | - | - |
477
+ | 2.48 | 3100 | 0.4284 | - | - | - |
478
+ | 2.56 | 3200 | 0.5041 | - | - | - |
479
+ | 2.64 | 3300 | 0.4755 | - | - | - |
480
+ | 2.7200 | 3400 | 0.4726 | - | - | - |
481
+ | 2.8 | 3500 | 0.4656 | - | - | - |
482
+ | 2.88 | 3600 | 0.4389 | - | - | - |
483
+ | 2.96 | 3700 | 0.4789 | - | - | - |
484
+ | 3.0 | 3750 | - | 1.0011 | -0.4586 | - |
485
+ | 3.04 | 3800 | 0.3492 | - | - | - |
486
+ | 3.12 | 3900 | 0.2477 | - | - | - |
487
+ | 3.2 | 4000 | 0.2556 | - | - | - |
488
+ | 3.2800 | 4100 | 0.2531 | - | - | - |
489
+ | 3.36 | 4200 | 0.2767 | - | - | - |
490
+ | 3.44 | 4300 | 0.2665 | - | - | - |
491
+ | 3.52 | 4400 | 0.2493 | - | - | - |
492
+ | 3.6 | 4500 | 0.2757 | - | - | - |
493
+ | 3.68 | 4600 | 0.2662 | - | - | - |
494
+ | 3.76 | 4700 | 0.2666 | - | - | - |
495
+ | 3.84 | 4800 | 0.2748 | - | - | - |
496
+ | 3.92 | 4900 | 0.246 | - | - | - |
497
+ | 4.0 | 5000 | 0.2411 | 1.2455 | -0.4665 | -0.2010 |
498
+
499
+
500
+ ### Framework Versions
501
+ - Python: 3.10.12
502
+ - Sentence Transformers: 3.0.1
503
+ - Transformers: 4.41.2
504
+ - PyTorch: 2.3.0+cu121
505
+ - Accelerate: 0.31.0
506
+ - Datasets: 2.20.0
507
+ - Tokenizers: 0.19.1
508
+
509
+ ## Citation
510
+
511
+ ### BibTeX
512
+
513
+ #### Sentence Transformers and SoftmaxLoss
514
+ ```bibtex
515
+ @inproceedings{reimers-2019-sentence-bert,
516
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
517
+ author = "Reimers, Nils and Gurevych, Iryna",
518
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
519
+ month = "11",
520
+ year = "2019",
521
+ publisher = "Association for Computational Linguistics",
522
+ url = "https://arxiv.org/abs/1908.10084",
523
+ }
524
+ ```
525
+
526
+ <!--
527
+ ## Glossary
528
+
529
+ *Clearly define terms in order to be accessible across audiences.*
530
+ -->
531
+
532
+ <!--
533
+ ## Model Card Authors
534
+
535
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
536
+ -->
537
+
538
+ <!--
539
+ ## Model Card Contact
540
+
541
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
542
+ -->
config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "indobenchmark/indobert-base-p2",
3
+ "_num_labels": 5,
4
+ "architectures": [
5
+ "BertModel"
6
+ ],
7
+ "attention_probs_dropout_prob": 0.1,
8
+ "classifier_dropout": null,
9
+ "directionality": "bidi",
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "id2label": {
14
+ "0": "LABEL_0",
15
+ "1": "LABEL_1",
16
+ "2": "LABEL_2",
17
+ "3": "LABEL_3",
18
+ "4": "LABEL_4"
19
+ },
20
+ "initializer_range": 0.02,
21
+ "intermediate_size": 3072,
22
+ "label2id": {
23
+ "LABEL_0": 0,
24
+ "LABEL_1": 1,
25
+ "LABEL_2": 2,
26
+ "LABEL_3": 3,
27
+ "LABEL_4": 4
28
+ },
29
+ "layer_norm_eps": 1e-12,
30
+ "max_position_embeddings": 512,
31
+ "model_type": "bert",
32
+ "num_attention_heads": 12,
33
+ "num_hidden_layers": 12,
34
+ "output_past": true,
35
+ "pad_token_id": 0,
36
+ "pooler_fc_size": 768,
37
+ "pooler_num_attention_heads": 12,
38
+ "pooler_num_fc_layers": 3,
39
+ "pooler_size_per_head": 128,
40
+ "pooler_type": "first_token_transform",
41
+ "position_embedding_type": "absolute",
42
+ "torch_dtype": "float32",
43
+ "transformers_version": "4.41.2",
44
+ "type_vocab_size": 2,
45
+ "use_cache": true,
46
+ "vocab_size": 50000
47
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.3.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f42f57e88ea8f2adb08afb1719f322198619cb35455a7888a0078d1431265c68
3
+ size 497787752
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 1000000000000000019884624838656,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff