breakcore2 commited on
Commit
56ddf82
·
1 Parent(s): 6fe7f1a

Update d-adaptation/notes.md

Browse files
Files changed (1) hide show
  1. d-adaptation/notes.md +1 -0
d-adaptation/notes.md CHANGED
@@ -13,6 +13,7 @@ UMP redone at dim 8 alpha 8 showed recognizable character but still significantl
13
  After redoing UMP at dim 8 alpha 8 with less cosine restarts (16->9), the results are much better.
14
  Consine restarts would likely affect how much time we spend at a high learning rate which could be the reason for blowing the model apart.
15
  dim 8 alpha 1 retrained at lower cosine restarts succeeded as well. Supposedly alpha scales the gradient down which causes the LR to up but obviously the relationship is not linear if 1/8x alpha did not cause the results to be garbage here. So the base LR is far more sensitive than the alpha choice.
 
16
 
17
  ## Dim
18
  128 dim shows some local noisy patterns. Reranking the model to a lower dim from 128 doesn't get rid of it. Converting the weights of the last up block in the unet does but also causes a noticable change in the generated character. Obviously you could reduce the last up block by a smaller amount.
 
13
  After redoing UMP at dim 8 alpha 8 with less cosine restarts (16->9), the results are much better.
14
  Consine restarts would likely affect how much time we spend at a high learning rate which could be the reason for blowing the model apart.
15
  dim 8 alpha 1 retrained at lower cosine restarts succeeded as well. Supposedly alpha scales the gradient down which causes the LR to up but obviously the relationship is not linear if 1/8x alpha did not cause the results to be garbage here. So the base LR is far more sensitive than the alpha choice.
16
+ This was further confirmed by running dim 8 alpha 1 with constant learning scheduler. The results were similar to high restart count with cosine.
17
 
18
  ## Dim
19
  128 dim shows some local noisy patterns. Reranking the model to a lower dim from 128 doesn't get rid of it. Converting the weights of the last up block in the unet does but also causes a noticable change in the generated character. Obviously you could reduce the last up block by a smaller amount.