ZeroCool94 commited on
Commit
1495b68
1 Parent(s): 68ef273

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -4
README.md CHANGED
@@ -89,7 +89,7 @@ image.save("fantasy_forest_illustration.png")
89
  - [Sygil Diffusion v0.2](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.2.ckpt): Resumed from Sygil Diffusion v0.1 and trained for a total of 1.77 million steps.
90
  - [Sygil Diffusion v0.3](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.3.ckpt): Resumed from Sygil Diffusion v0.2 and trained for a total of 2.01 million steps so far.
91
  - #### Beta:
92
- - [sygil-diffusion-v0.4_2216300_lora.ckpt](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.4_2216300_lora.ckpt): Resumed from Sygil Diffusion v0.3 and trained for a total of 2.21 million steps so far.
93
 
94
  Note: Checkpoints under the Beta section are updated daily or at least 3-4 times a week. This is usually the equivalent of 1-2 training session,
95
  this is done until they are stable enough to be moved into a proper release, usually every 1 or 2 weeks.
@@ -105,14 +105,14 @@ The model was trained on the following dataset:
105
 
106
  **Hardware and others**
107
  - **Hardware:** 1 x Nvidia RTX 3050 8GB GPU
108
- - **Hours Trained:** 804 hours approximately.
109
  - **Optimizer:** AdamW
110
  - **Adam Beta 1**: 0.9
111
  - **Adam Beta 2**: 0.999
112
  - **Adam Weight Decay**: 0.01
113
  - **Adam Epsilon**: 1e-8
114
  - **Gradient Checkpointing**: True
115
- - **Gradient Accumulations**: 4
116
  - **Batch:** 1
117
  - **Learning Rate:** 1e-7
118
  - **Learning Rate Scheduler:** cosine_with_restarts
@@ -120,7 +120,15 @@ The model was trained on the following dataset:
120
  - **Lora unet Learning Rate**: 1e-7
121
  - **Lora Text Encoder Learning Rate**: 1e-7
122
  - **Resolution**: 512 pixels
123
- - **Total Training Steps:** 2,216,300
 
 
 
 
 
 
 
 
124
 
125
  Developed by: [ZeroCool94](https://github.com/ZeroCool940711) at [Sygil-Dev](https://github.com/Sygil-Dev/)
126
 
 
89
  - [Sygil Diffusion v0.2](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.2.ckpt): Resumed from Sygil Diffusion v0.1 and trained for a total of 1.77 million steps.
90
  - [Sygil Diffusion v0.3](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.3.ckpt): Resumed from Sygil Diffusion v0.2 and trained for a total of 2.01 million steps so far.
91
  - #### Beta:
92
+ - [sygil-diffusion-v0.4_2318263_lora.ckptt](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.4_2318263_lora.ckpt): Resumed from Sygil Diffusion v0.3 and trained for a total of 2.31 million steps so far.
93
 
94
  Note: Checkpoints under the Beta section are updated daily or at least 3-4 times a week. This is usually the equivalent of 1-2 training session,
95
  this is done until they are stable enough to be moved into a proper release, usually every 1 or 2 weeks.
 
105
 
106
  **Hardware and others**
107
  - **Hardware:** 1 x Nvidia RTX 3050 8GB GPU
108
+ - **Hours Trained:** 840 hours approximately.
109
  - **Optimizer:** AdamW
110
  - **Adam Beta 1**: 0.9
111
  - **Adam Beta 2**: 0.999
112
  - **Adam Weight Decay**: 0.01
113
  - **Adam Epsilon**: 1e-8
114
  - **Gradient Checkpointing**: True
115
+ - **Gradient Accumulations**: 400
116
  - **Batch:** 1
117
  - **Learning Rate:** 1e-7
118
  - **Learning Rate Scheduler:** cosine_with_restarts
 
120
  - **Lora unet Learning Rate**: 1e-7
121
  - **Lora Text Encoder Learning Rate**: 1e-7
122
  - **Resolution**: 512 pixels
123
+ - **Total Training Steps:** 2,318,263
124
+
125
+
126
+ Note: For the learning rate I'm testing something new, after changing from using the `constant` scheduler to `cosine_with_restarts` after v0.3 was released, I noticed
127
+ it practically uses the optimal learning rate while trying to minimize the loss value, so, when every training session finishes I use for the next session the latest
128
+ learning rate value shown for the last few steps from the last session, this makes it so it will overtime decrease at a constant rate. When I add a lot of data to the training dataset
129
+ at once, I move the learning rate back to 1e-7 which then the scheduler will move down again as it learns more from the new data, this makes it so the training
130
+ doesn't overfit or uses a learning rate too low that makes the model not learn anything new for a while.
131
+
132
 
133
  Developed by: [ZeroCool94](https://github.com/ZeroCool940711) at [Sygil-Dev](https://github.com/Sygil-Dev/)
134