Update README.md
Browse files
README.md
CHANGED
@@ -28,5 +28,6 @@ We finetuned the `wte` and `wpe` layers of GPT-2 (while freezing the parameters
|
|
28 |
- max_eval_samples: 5000
|
29 |
```
|
30 |
|
31 |
-
Setup
|
32 |
-
|
|
|
|
28 |
- max_eval_samples: 5000
|
29 |
```
|
30 |
|
31 |
+
**Setup**: 8 RTX-3090 GPUs, trained for seven days (total training steps: 110500, effective train batch size: 64, tokens per batch: 1024)
|
32 |
+
|
33 |
+
**Final checkpoint**: checkpoint-111500
|