FPHam commited on
Commit
9d83d48
1 Parent(s): 034eb5c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -29,7 +29,7 @@ Edit: It could prevent overfitting though and hence help with generalization. It
29
 
30
  - size of dataset matters when you are finetuning on base, but matters less when finetuning on well finetuned model. - in fact sometimes less is better in that case or you may be ruining a good previous finetuning.
31
 
32
- - alpha = 2x rank seems like something that came from the old times when people had potato VRAM at most. I really don't feel like it makes much sense - it multiplies the weights and that's it. (check the PEFT code) Making things louder, makes also noise louder.
33
 
34
  - my favorite scheduler is warmup, hold for 1 epoch then cosine down for the next 1- x epochs.
35
 
 
29
 
30
  - size of dataset matters when you are finetuning on base, but matters less when finetuning on well finetuned model. - in fact sometimes less is better in that case or you may be ruining a good previous finetuning.
31
 
32
+ - alpha = 2x rank seems like something that came from the old times when people had potato VRAM at most and wanted to get there fast. I really don't feel like it makes much sense - it multiplies the weights and that's it. Making things louder, makes also noise louder.
33
 
34
  - my favorite scheduler is warmup, hold for 1 epoch then cosine down for the next 1- x epochs.
35