NibiruTwin
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -69,19 +69,59 @@ https://huggingface.co/NibiruTwin/llm-jp-3-13b-c_it
|
|
69 |
のデータセットを使用しました。
|
70 |
|
71 |
A100の環境を用いても、DPOを回すのはGPUのメモリが足りなかったり、epoch数に限界があったので
|
|
|
72 |
|
73 |
```
|
74 |
-
|
75 |
-
|
76 |
-
```
|
77 |
|
78 |
-
で100件までのモデルに絞り込み、
|
79 |
|
80 |
```
|
81 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
|
83 |
```
|
84 |
-
でtrainしました。
|
85 |
|
86 |
結果として大喜利については以下のような答えになっていました。
|
87 |
|
|
|
69 |
のデータセットを使用しました。
|
70 |
|
71 |
A100の環境を用いても、DPOを回すのはGPUのメモリが足りなかったり、epoch数に限界があったので
|
72 |
+
num_train_epochs = 3,
|
73 |
|
74 |
```
|
75 |
+
でtrainしました。
|
|
|
|
|
76 |
|
|
|
77 |
|
78 |
```
|
79 |
+
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
|
80 |
+
\\ /| Num examples = 100 | Num Epochs = 3
|
81 |
+
O^O/ \_/ \ Batch size per device = 2 | Gradient Accumulation steps = 4
|
82 |
+
\ / Total batch size = 8 | Total steps = 36
|
83 |
+
"-____-" Number of trainable parameters = 125,173,760
|
84 |
+
[36/36 01:54, Epoch 2/3]
|
85 |
+
Step Training Loss rewards / chosen rewards / rejected rewards / accuracies rewards / margins logps / rejected logps / chosen logits / rejected logits / chosen
|
86 |
+
1 0.000100 3.348554 -6.467402 1.000000 9.815956 -169.229355 -175.684265 0.611595 0.875623
|
87 |
+
2 0.000100 2.975397 -6.360792 1.000000 9.336189 -154.576660 -196.990601 0.632885 0.986017
|
88 |
+
3 0.000100 4.033119 -4.941322 1.000000 8.974442 -127.297821 -175.932999 0.575199 1.004188
|
89 |
+
4 0.000200 3.079573 -5.701199 1.000000 8.780772 -139.620758 -173.067078 -0.431688 0.508375
|
90 |
+
5 0.000200 3.642621 -5.261364 1.000000 8.903986 -121.130615 -171.747650 0.855356 0.792505
|
91 |
+
6 0.000300 3.081389 -5.276991 1.000000 8.358380 -131.040268 -180.695892 1.087221 1.099403
|
92 |
+
7 0.000400 4.341475 -4.463219 1.000000 8.804693 -115.383461 -138.774704 -0.299891 0.583815
|
93 |
+
8 0.001200 2.155223 -5.431589 1.000000 7.586812 -133.833633 -142.437195 0.526511 0.799039
|
94 |
+
9 0.000500 2.844069 -4.996197 1.000000 7.840266 -150.136200 -176.394653 0.631835 0.720139
|
95 |
+
10 0.001800 3.158137 -3.853688 1.000000 7.011826 -108.524597 -141.101532 0.738414 0.989277
|
96 |
+
11 0.001300 3.399171 -3.538917 1.000000 6.938087 -78.884750 -146.172821 0.182737 0.624548
|
97 |
+
12 0.003200 2.742315 -4.005011 1.000000 6.747325 -95.816872 -137.083588 -0.016122 0.276348
|
98 |
+
13 0.000300 2.219271 -6.403042 1.000000 8.622313 -166.142792 -178.155762 0.178710 0.761686
|
99 |
+
14 0.000300 2.699187 -6.124379 1.000000 8.823566 -131.572098 -140.794952 0.002769 0.447839
|
100 |
+
15 0.000500 4.734462 -4.763044 1.000000 9.497507 -111.550545 -181.736755 1.172358 1.424275
|
101 |
+
16 0.000100 3.982580 -5.619477 1.000000 9.602057 -129.902466 -197.776779 0.071804 0.588056
|
102 |
+
17 0.000100 4.331498 -6.222175 1.000000 10.553673 -165.588058 -164.591766 1.094546 0.889692
|
103 |
+
18 0.000100 4.991781 -4.481319 1.000000 9.473101 -91.877243 -150.005219 -0.047461 0.751593
|
104 |
+
19 0.000200 3.501364 -6.373612 1.000000 9.874977 -140.134232 -170.552658 0.189669 0.601683
|
105 |
+
20 0.000200 3.605657 -5.074142 1.000000 8.679799 -117.568741 -153.246170 -0.309885 0.501098
|
106 |
+
21 0.000200 3.203712 -6.348371 1.000000 9.552082 -156.897690 -186.776581 1.007442 1.271394
|
107 |
+
22 0.000200 3.929119 -5.364758 1.000000 9.293877 -112.621918 -93.523651 -0.448841 0.339729
|
108 |
+
23 0.000200 4.845633 -4.518156 1.000000 9.363789 -96.659676 -147.822693 -0.233996 0.634066
|
109 |
+
24 0.000200 3.045211 -6.681721 1.000000 9.726932 -144.385818 -143.605927 0.440999 0.664074
|
110 |
+
25 0.000200 2.850045 -6.654698 1.000000 9.504744 -190.274475 -173.323746 1.471910 0.817418
|
111 |
+
26 0.000100 3.326446 -6.145639 1.000000 9.472086 -139.847061 -194.209137 1.316685 1.462867
|
112 |
+
27 0.000100 3.676937 -6.083375 1.000000 9.760312 -129.533386 -160.582367 -0.027238 0.892122
|
113 |
+
28 0.000100 4.144113 -5.807096 1.000000 9.951208 -145.089432 -207.662384 1.121619 1.289729
|
114 |
+
29 0.000100 3.373916 -6.345547 1.000000 9.719462 -183.863174 -159.793610 0.931905 0.620221
|
115 |
+
30 0.000200 3.944859 -5.516920 1.000000 9.461779 -128.763199 -124.433006 0.235643 0.282284
|
116 |
+
31 0.000200 3.264518 -6.059732 1.000000 9.324250 -132.112762 -134.414032 0.035966 0.557026
|
117 |
+
32 0.000100 3.494095 -6.447097 1.000000 9.941193 -130.592957 -129.188766 -0.017757 0.388424
|
118 |
+
33 0.000200 3.858253 -5.999524 1.000000 9.857778 -144.017731 -133.201035 0.412320 0.247053
|
119 |
+
34 0.000200 4.195073 -5.941508 1.000000 10.136581 -144.858795 -162.841766 -0.020949 0.564567
|
120 |
+
35 0.000100 5.392914 -4.364581 1.000000 9.757494 -97.780762 -143.621002 0.270843 0.839165
|
121 |
+
36 0.000100 2.788383 -7.393952 1.000000 10.182335 -154.236618 -184.300690 0.392709 0.757870
|
122 |
+
TrainOutput(global_step=36, training_loss=0.00038064550871139445, metrics={'train_runtime': 118.2651, 'train_samples_per_second': 2.537, 'train_steps_per_second': 0.304, 'total_flos': 0.0, 'train_loss': 0.00038064550871139445, 'epoch': 2.88})
|
123 |
|
124 |
```
|
|
|
125 |
|
126 |
結果として大喜利については以下のような答えになっていました。
|
127 |
|