Lambent commited on
Commit
2abd74f
1 Parent(s): 3ca31ed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -28,6 +28,7 @@ Method by method comparison, initial evaluation loss on Cosmopedia data:
28
 
29
  * Full tuning (aka continued pretraining), batch 8: 1.615
30
  * LISA fine-tuning, 4 layers switching every 10 steps, batch 8: 1.217
 
31
  * Qlora fine-tuning, rank 256, scale factor 1, batch 8: 1.102
32
  * Galore tuning, rank 256, scale factor 1, batch 8: 1.182
33
  * This Model Stock merge of all 4 training methods: 1.038
@@ -42,6 +43,7 @@ Training set validation results:
42
  * LISA Loss: 0.2534
43
  * GaLore Loss: 0.2426
44
  * QLoRA Loss: 0.2078
 
45
  * Full Tune Loss: 0.2049
46
 
47
  Overall ... not sure what to make of this, beyond that high-rank QLoRA is doing something particularly impressive while using only like 6GB of vRAM.
 
28
 
29
  * Full tuning (aka continued pretraining), batch 8: 1.615
30
  * LISA fine-tuning, 4 layers switching every 10 steps, batch 8: 1.217
31
+ * QLoRA with Dora (otherwise like below): 1.105
32
  * Qlora fine-tuning, rank 256, scale factor 1, batch 8: 1.102
33
  * Galore tuning, rank 256, scale factor 1, batch 8: 1.182
34
  * This Model Stock merge of all 4 training methods: 1.038
 
43
  * LISA Loss: 0.2534
44
  * GaLore Loss: 0.2426
45
  * QLoRA Loss: 0.2078
46
+ * QLoRA with Dora Loss: 0.2055 (almost identical training graph)
47
  * Full Tune Loss: 0.2049
48
 
49
  Overall ... not sure what to make of this, beyond that high-rank QLoRA is doing something particularly impressive while using only like 6GB of vRAM.