Update README.md
Browse files
README.md
CHANGED
@@ -28,4 +28,5 @@ We finetuned the `wte` and `wpe` layers of GPT-2 (while freezing the parameters
|
|
28 |
- max_eval_samples: 5000
|
29 |
```
|
30 |
|
31 |
-
Setup: 8 RTX-3090 GPUs, trained for seven days (total training steps: 110500, effective train batch size: 64, tokens per batch: 1024)
|
|
|
|
28 |
- max_eval_samples: 5000
|
29 |
```
|
30 |
|
31 |
+
Setup: 8 RTX-3090 GPUs, trained for seven days (total training steps: 110500, effective train batch size: 64, tokens per batch: 1024)
|
32 |
+
Final checkpoint: checkpoint-111500
|