WeightsnWizardry
commited on
Commit
•
d57f73c
1
Parent(s):
73bed7a
Update README.md
Browse files
README.md
CHANGED
@@ -116,16 +116,16 @@ Samples from each of the datasets have been programmatically formatted to chat,
|
|
116 |
| **Hyperparameter** | **Value** |
|
117 |
|--------------------|------------|
|
118 |
| Num Rollouts | 1024 |
|
119 |
-
|
|
120 |
| Value Epochs | 1 |
|
121 |
| KL Coef | 0.01 |
|
122 |
| Gamma | 1.0 |
|
123 |
| GAE Lambda | 0.95 |
|
124 |
-
| Clip Range
|
125 |
| Clip Range Value | 0.2 |
|
126 |
| Whiten Advantages | `true` |
|
127 |
| Whiten Rewards | `false` |
|
128 |
-
| Score on EOD
|
129 |
| Max Steps | 200 |
|
130 |
| PPO steps/epoch | 1 |
|
131 |
| Value steps/epoch | 8 |
|
|
|
116 |
| **Hyperparameter** | **Value** |
|
117 |
|--------------------|------------|
|
118 |
| Num Rollouts | 1024 |
|
119 |
+
| Policy Epochs | 1 |
|
120 |
| Value Epochs | 1 |
|
121 |
| KL Coef | 0.01 |
|
122 |
| Gamma | 1.0 |
|
123 |
| GAE Lambda | 0.95 |
|
124 |
+
| Clip Range Policy | 0.2 |
|
125 |
| Clip Range Value | 0.2 |
|
126 |
| Whiten Advantages | `true` |
|
127 |
| Whiten Rewards | `false` |
|
128 |
+
| Score on EOD | `true` |
|
129 |
| Max Steps | 200 |
|
130 |
| PPO steps/epoch | 1 |
|
131 |
| Value steps/epoch | 8 |
|