Text-to-Image
Not-For-All-Audiences
drhead commited on
Commit
cc83568
1 Parent(s): f32015c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -13,8 +13,8 @@ A finetune resumed from Fluffyrock Unleashed v1.0, with the following changes:
13
 
14
  ### Technical changes:
15
  - Adaptive timestep weighting: Timesteps are weighted using a similar method to what the EDM2 paper used, according to the homoscedastic uncertainty of MSE loss on each timestep, thereby equalizing the contribution of each timestep. Loss weight was also conditioned on resolution in order to equalize the contribution of each resolution group. The overall effect of this is that the model is now very good at both high- and low-frequency details, and is not as biased towards blurry backgrounds.
16
- - EMA weights were assembled post-hoc using the method described in the EDM2 paper. In our evaluations, we found sigma=0.225 to produce the highest quality images.
17
- - Cross-attention masking was applied to extra completely empty blocks of CLIP token embeddings. Previously, if an image had a short caption, it would be fed in similarly to if you had added `BREAK BREAK BREAK` to the prompt in A1111, which caused the model to depend on those extra blocks and made it produce better images with 225 tokens of input. The model is no longer dependent on this.
18
  - Optimizer replaced with schedule-free AdamW, and weight decay was turned off in bias layers, which has greatly stabilized training.
19
 
20
  ### Data input changes:
 
13
 
14
  ### Technical changes:
15
  - Adaptive timestep weighting: Timesteps are weighted using a similar method to what the EDM2 paper used, according to the homoscedastic uncertainty of MSE loss on each timestep, thereby equalizing the contribution of each timestep. Loss weight was also conditioned on resolution in order to equalize the contribution of each resolution group. The overall effect of this is that the model is now very good at both high- and low-frequency details, and is not as biased towards blurry backgrounds.
16
+ - EMA weights were assembled post-hoc using the method described in the EDM2 paper. The checkpoint shipped uses an EMA length sigma of 0.225.
17
+ - Cross-attention masking was applied to extra completely empty blocks of CLIP token embeddings, making the model work better with short prompts. Previously, if an image had a short caption, it would be fed in similarly to if you had added `BREAK BREAK BREAK` to the prompt in A1111, which caused the model to depend on those extra blocks and made it produce better images with 225 tokens of input. The model is no longer dependent on this.
18
  - Optimizer replaced with schedule-free AdamW, and weight decay was turned off in bias layers, which has greatly stabilized training.
19
 
20
  ### Data input changes: