Update README.md
Browse files
README.md
CHANGED
@@ -28,6 +28,7 @@ Method by method comparison, initial evaluation loss on Cosmopedia data:
|
|
28 |
|
29 |
* Full tuning (aka continued pretraining), batch 8: 1.615
|
30 |
* LISA fine-tuning, 4 layers switching every 10 steps, batch 8: 1.217
|
|
|
31 |
* Qlora fine-tuning, rank 256, scale factor 1, batch 8: 1.102
|
32 |
* Galore tuning, rank 256, scale factor 1, batch 8: 1.182
|
33 |
* This Model Stock merge of all 4 training methods: 1.038
|
@@ -42,6 +43,7 @@ Training set validation results:
|
|
42 |
* LISA Loss: 0.2534
|
43 |
* GaLore Loss: 0.2426
|
44 |
* QLoRA Loss: 0.2078
|
|
|
45 |
* Full Tune Loss: 0.2049
|
46 |
|
47 |
Overall ... not sure what to make of this, beyond that high-rank QLoRA is doing something particularly impressive while using only like 6GB of vRAM.
|
|
|
28 |
|
29 |
* Full tuning (aka continued pretraining), batch 8: 1.615
|
30 |
* LISA fine-tuning, 4 layers switching every 10 steps, batch 8: 1.217
|
31 |
+
* QLoRA with Dora (otherwise like below): 1.105
|
32 |
* Qlora fine-tuning, rank 256, scale factor 1, batch 8: 1.102
|
33 |
* Galore tuning, rank 256, scale factor 1, batch 8: 1.182
|
34 |
* This Model Stock merge of all 4 training methods: 1.038
|
|
|
43 |
* LISA Loss: 0.2534
|
44 |
* GaLore Loss: 0.2426
|
45 |
* QLoRA Loss: 0.2078
|
46 |
+
* QLoRA with Dora Loss: 0.2055 (almost identical training graph)
|
47 |
* Full Tune Loss: 0.2049
|
48 |
|
49 |
Overall ... not sure what to make of this, beyond that high-rank QLoRA is doing something particularly impressive while using only like 6GB of vRAM.
|