CocoRoF commited on
Commit
5bd4b0b
·
verified ·
1 Parent(s): bf077c5
Files changed (1) hide show
  1. README.md +113 -0
README.md ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: x2bee/KoModernBERT-base-v02
5
+ tags:
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: ModernBERT_SimCSE_v02
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ # ModernBERT_SimCSE_v02
16
+
17
+ This model is a fine-tuned version of [x2bee/KoModernBERT-base-v02](https://huggingface.co/x2bee/KoModernBERT-base-v02) on an unknown dataset.
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 0.0370
20
+ - Pearson Cosine: 0.7760
21
+ - Spearman Cosine: 0.7753
22
+ - Pearson Manhattan: 0.7337
23
+ - Spearman Manhattan: 0.7389
24
+ - Pearson Euclidean: 0.7316
25
+ - Spearman Euclidean: 0.7371
26
+ - Pearson Dot: 0.7343
27
+ - Spearman Dot: 0.7356
28
+
29
+ ## Model description
30
+
31
+ More information needed
32
+
33
+ ## Intended uses & limitations
34
+
35
+ More information needed
36
+
37
+ ## Training and evaluation data
38
+
39
+ More information needed
40
+
41
+ ## Training procedure
42
+
43
+ ### Training hyperparameters
44
+
45
+ The following hyperparameters were used during training:
46
+ - learning_rate: 1e-05
47
+ - train_batch_size: 2
48
+ - eval_batch_size: 1
49
+ - seed: 42
50
+ - distributed_type: multi-GPU
51
+ - num_devices: 8
52
+ - gradient_accumulation_steps: 16
53
+ - total_train_batch_size: 256
54
+ - total_eval_batch_size: 8
55
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
56
+ - lr_scheduler_type: linear
57
+ - lr_scheduler_warmup_ratio: 0.1
58
+ - num_epochs: 10.0
59
+
60
+ ### Training results
61
+
62
+ | Training Loss | Epoch | Step | Validation Loss | Pearson Cosine | Spearman Cosine | Pearson Manhattan | Spearman Manhattan | Pearson Euclidean | Spearman Euclidean | Pearson Dot | Spearman Dot |
63
+ |:-------------:|:------:|:-----:|:---------------:|:--------------:|:---------------:|:-----------------:|:------------------:|:-----------------:|:------------------:|:-----------:|:------------:|
64
+ | 0.3877 | 0.2343 | 250 | 0.1542 | 0.7471 | 0.7499 | 0.7393 | 0.7393 | 0.7395 | 0.7397 | 0.6414 | 0.6347 |
65
+ | 0.2805 | 0.4686 | 500 | 0.1142 | 0.7578 | 0.7643 | 0.7619 | 0.7652 | 0.7619 | 0.7654 | 0.6366 | 0.6341 |
66
+ | 0.2331 | 0.7029 | 750 | 0.0950 | 0.7674 | 0.7772 | 0.7685 | 0.7747 | 0.7682 | 0.7741 | 0.6584 | 0.6570 |
67
+ | 0.2455 | 0.9372 | 1000 | 0.0924 | 0.7677 | 0.7781 | 0.7714 | 0.7778 | 0.7712 | 0.7776 | 0.6569 | 0.6558 |
68
+ | 0.1933 | 1.1715 | 1250 | 0.0802 | 0.7704 | 0.7790 | 0.7678 | 0.7742 | 0.7676 | 0.7740 | 0.6808 | 0.6797 |
69
+ | 0.1872 | 1.4058 | 1500 | 0.0790 | 0.7685 | 0.7777 | 0.7693 | 0.7755 | 0.7690 | 0.7752 | 0.6580 | 0.6569 |
70
+ | 0.1628 | 1.6401 | 1750 | 0.0719 | 0.7652 | 0.7734 | 0.7619 | 0.7685 | 0.7616 | 0.7679 | 0.6584 | 0.6574 |
71
+ | 0.1983 | 1.8744 | 2000 | 0.0737 | 0.7772 | 0.7864 | 0.7654 | 0.7748 | 0.7649 | 0.7741 | 0.6604 | 0.6608 |
72
+ | 0.1448 | 2.1087 | 2250 | 0.0637 | 0.7666 | 0.7737 | 0.7644 | 0.7706 | 0.7639 | 0.7702 | 0.6530 | 0.6506 |
73
+ | 0.1449 | 2.3430 | 2500 | 0.0579 | 0.7641 | 0.7698 | 0.7590 | 0.7654 | 0.7584 | 0.7652 | 0.6659 | 0.6637 |
74
+ | 0.1443 | 2.5773 | 2750 | 0.0596 | 0.7583 | 0.7659 | 0.7599 | 0.7656 | 0.7594 | 0.7652 | 0.6585 | 0.6551 |
75
+ | 0.1363 | 2.8116 | 3000 | 0.0575 | 0.7671 | 0.7727 | 0.7570 | 0.7629 | 0.7564 | 0.7624 | 0.6769 | 0.6756 |
76
+ | 0.1227 | 3.0459 | 3250 | 0.0517 | 0.7637 | 0.7670 | 0.7567 | 0.7616 | 0.7560 | 0.7612 | 0.6736 | 0.6714 |
77
+ | 0.103 | 3.2802 | 3500 | 0.0464 | 0.7603 | 0.7643 | 0.7484 | 0.7535 | 0.7475 | 0.7527 | 0.6813 | 0.6796 |
78
+ | 0.0982 | 3.5145 | 3750 | 0.0451 | 0.7657 | 0.7695 | 0.7452 | 0.7527 | 0.7441 | 0.7516 | 0.6821 | 0.6822 |
79
+ | 0.0987 | 3.7488 | 4000 | 0.0467 | 0.7577 | 0.7607 | 0.7397 | 0.7446 | 0.7385 | 0.7434 | 0.6644 | 0.6623 |
80
+ | 0.1111 | 3.9831 | 4250 | 0.0406 | 0.7691 | 0.7703 | 0.7471 | 0.7525 | 0.7457 | 0.7510 | 0.6998 | 0.7006 |
81
+ | 0.0888 | 4.2174 | 4500 | 0.0421 | 0.7580 | 0.7598 | 0.7412 | 0.7468 | 0.7401 | 0.7457 | 0.6874 | 0.6866 |
82
+ | 0.0756 | 4.4517 | 4750 | 0.0395 | 0.7664 | 0.7674 | 0.7432 | 0.7480 | 0.7419 | 0.7465 | 0.7008 | 0.7012 |
83
+ | 0.0871 | 4.6860 | 5000 | 0.0411 | 0.7588 | 0.7604 | 0.7405 | 0.7456 | 0.7389 | 0.7441 | 0.6872 | 0.6867 |
84
+ | 0.0839 | 4.9203 | 5250 | 0.0400 | 0.7643 | 0.7659 | 0.7311 | 0.7367 | 0.7297 | 0.7351 | 0.6955 | 0.6969 |
85
+ | 0.0499 | 5.1546 | 5500 | 0.0392 | 0.7609 | 0.7616 | 0.7335 | 0.7393 | 0.7321 | 0.7379 | 0.6993 | 0.6999 |
86
+ | 0.0542 | 5.3889 | 5750 | 0.0385 | 0.7664 | 0.7669 | 0.7399 | 0.7454 | 0.7386 | 0.7445 | 0.7061 | 0.7065 |
87
+ | 0.0555 | 5.6232 | 6000 | 0.0396 | 0.7571 | 0.7579 | 0.7293 | 0.7344 | 0.7279 | 0.7331 | 0.7004 | 0.6993 |
88
+ | 0.0547 | 5.8575 | 6250 | 0.0384 | 0.7664 | 0.7667 | 0.7382 | 0.7432 | 0.7370 | 0.7420 | 0.7110 | 0.7119 |
89
+ | 0.0476 | 6.0918 | 6500 | 0.0388 | 0.7638 | 0.7642 | 0.7338 | 0.7392 | 0.7323 | 0.7378 | 0.7008 | 0.7013 |
90
+ | 0.043 | 6.3261 | 6750 | 0.0376 | 0.7692 | 0.7696 | 0.7357 | 0.7409 | 0.7343 | 0.7396 | 0.7138 | 0.7152 |
91
+ | 0.0436 | 6.5604 | 7000 | 0.0381 | 0.7662 | 0.7662 | 0.7351 | 0.7398 | 0.7334 | 0.7384 | 0.7105 | 0.7116 |
92
+ | 0.032 | 6.7948 | 7250 | 0.0377 | 0.7692 | 0.7695 | 0.7333 | 0.7375 | 0.7316 | 0.7357 | 0.7224 | 0.7242 |
93
+ | 0.0342 | 7.0291 | 7500 | 0.0378 | 0.7685 | 0.7678 | 0.7333 | 0.7376 | 0.7320 | 0.7365 | 0.7184 | 0.7187 |
94
+ | 0.0341 | 7.2634 | 7750 | 0.0377 | 0.7699 | 0.7695 | 0.7336 | 0.7378 | 0.7317 | 0.7362 | 0.7237 | 0.7244 |
95
+ | 0.0329 | 7.4977 | 8000 | 0.0375 | 0.7706 | 0.7697 | 0.7364 | 0.7409 | 0.7346 | 0.7395 | 0.7248 | 0.7250 |
96
+ | 0.035 | 7.7320 | 8250 | 0.0380 | 0.7700 | 0.7691 | 0.7308 | 0.7352 | 0.7288 | 0.7335 | 0.7271 | 0.7276 |
97
+ | 0.0361 | 7.9663 | 8500 | 0.0377 | 0.7717 | 0.7709 | 0.7276 | 0.7318 | 0.7254 | 0.7297 | 0.7309 | 0.7317 |
98
+ | 0.0224 | 8.2006 | 8750 | 0.0377 | 0.7711 | 0.7703 | 0.7328 | 0.7369 | 0.7310 | 0.7356 | 0.7244 | 0.7254 |
99
+ | 0.0256 | 8.4349 | 9000 | 0.0386 | 0.7652 | 0.7647 | 0.7274 | 0.7319 | 0.7254 | 0.7303 | 0.7186 | 0.7191 |
100
+ | 0.0283 | 8.6692 | 9250 | 0.0370 | 0.7740 | 0.7732 | 0.7294 | 0.7331 | 0.7272 | 0.7312 | 0.7285 | 0.7298 |
101
+ | 0.0274 | 8.9035 | 9500 | 0.0372 | 0.7742 | 0.7739 | 0.7288 | 0.7346 | 0.7266 | 0.7328 | 0.7298 | 0.7317 |
102
+ | 0.025 | 9.1378 | 9750 | 0.0377 | 0.7719 | 0.7718 | 0.7334 | 0.7389 | 0.7313 | 0.7372 | 0.7295 | 0.7309 |
103
+ | 0.031 | 9.3721 | 10000 | 0.0372 | 0.7734 | 0.7735 | 0.7373 | 0.7421 | 0.7357 | 0.7407 | 0.7253 | 0.7266 |
104
+ | 0.0243 | 9.6064 | 10250 | 0.0374 | 0.7731 | 0.7727 | 0.7321 | 0.7364 | 0.7300 | 0.7346 | 0.7303 | 0.7306 |
105
+ | 0.0233 | 9.8407 | 10500 | 0.0370 | 0.7760 | 0.7753 | 0.7337 | 0.7389 | 0.7316 | 0.7371 | 0.7343 | 0.7356 |
106
+
107
+
108
+ ### Framework versions
109
+
110
+ - Transformers 4.48.3
111
+ - Pytorch 2.5.1+cu124
112
+ - Datasets 3.2.0
113
+ - Tokenizers 0.21.0