FPHam
/

LORA-secrets

Model card Files Files and versions Community

FPHam commited on Oct 26, 2023

Commit

034eb5c

•

1 Parent(s): 39bc73e

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -33,8 +33,8 @@ Edit: It could prevent overfitting though and hence help with generalization. It
 - my favorite scheduler is warmup, hold for 1 epoch then cosine down for the next 1- x epochs.
-- rank is literally how many trainable parameters you get - you don't have to try to find some other meaning (style vs knowledge). It's like an image taken with 1Mpixel vs 16Mpixel. You always get the whole image, but on 1Mpixel the details are very mushy.
-the problem of course is - do you have enough diverse training data to fill those parameters with? If not, you'd be creating very specific model that would have hard time to generalize. Lowring rank will help with generalizations, but also the mundane details will be lost.
 **Anything else?**

 - my favorite scheduler is warmup, hold for 1 epoch then cosine down for the next 1- x epochs.
+- rank is literally how many trainable parameters you get - you don't have to try to find some other meaning (style vs knowledge). It's like an image taken with 1Mpixel vs 16Mpixel. You always get the whole image, but on 1Mpixel the details are very mushy - while you can still see the big subject, you better not expect the details will be fine.
+The problem of course is - do you have enough diverse training data to fill those parameters with? If not, you'd be creating very specific model that would have hard time to generalize. Lowring rank will help with generalizations, but also the mundane details will be lost.
 **Anything else?**