Commit
·
56ddf82
1
Parent(s):
6fe7f1a
Update d-adaptation/notes.md
Browse files- d-adaptation/notes.md +1 -0
d-adaptation/notes.md
CHANGED
@@ -13,6 +13,7 @@ UMP redone at dim 8 alpha 8 showed recognizable character but still significantl
|
|
13 |
After redoing UMP at dim 8 alpha 8 with less cosine restarts (16->9), the results are much better.
|
14 |
Consine restarts would likely affect how much time we spend at a high learning rate which could be the reason for blowing the model apart.
|
15 |
dim 8 alpha 1 retrained at lower cosine restarts succeeded as well. Supposedly alpha scales the gradient down which causes the LR to up but obviously the relationship is not linear if 1/8x alpha did not cause the results to be garbage here. So the base LR is far more sensitive than the alpha choice.
|
|
|
16 |
|
17 |
## Dim
|
18 |
128 dim shows some local noisy patterns. Reranking the model to a lower dim from 128 doesn't get rid of it. Converting the weights of the last up block in the unet does but also causes a noticable change in the generated character. Obviously you could reduce the last up block by a smaller amount.
|
|
|
13 |
After redoing UMP at dim 8 alpha 8 with less cosine restarts (16->9), the results are much better.
|
14 |
Consine restarts would likely affect how much time we spend at a high learning rate which could be the reason for blowing the model apart.
|
15 |
dim 8 alpha 1 retrained at lower cosine restarts succeeded as well. Supposedly alpha scales the gradient down which causes the LR to up but obviously the relationship is not linear if 1/8x alpha did not cause the results to be garbage here. So the base LR is far more sensitive than the alpha choice.
|
16 |
+
This was further confirmed by running dim 8 alpha 1 with constant learning scheduler. The results were similar to high restart count with cosine.
|
17 |
|
18 |
## Dim
|
19 |
128 dim shows some local noisy patterns. Reranking the model to a lower dim from 128 doesn't get rid of it. Converting the weights of the last up block in the unet does but also causes a noticable change in the generated character. Obviously you could reduce the last up block by a smaller amount.
|