Update README.md
Browse files
README.md
CHANGED
|
@@ -34,7 +34,7 @@ We used the first 100 million tokens of the 10BT Sample of Fineweb-Edu to train
|
|
| 34 |
- Batch Size: 32
|
| 35 |
- Gradient Accumulation Steps: 4
|
| 36 |
- Compile model: False
|
| 37 |
-
- Device Type: float16 - CUDA on Kaggle T4 16GB GPU
|
| 38 |
|
| 39 |
## Training code
|
| 40 |
As in all of our models, you can find the full training code in this repo in the files `train.py`, `model.py`, `configurator.py` and `prepare.py`.
|
|
|
|
| 34 |
- Batch Size: 32
|
| 35 |
- Gradient Accumulation Steps: 4
|
| 36 |
- Compile model: False
|
| 37 |
+
- Device Type: float16 - CUDA on Kaggle T4 16GB GPU (train time: ~71min)
|
| 38 |
|
| 39 |
## Training code
|
| 40 |
As in all of our models, you can find the full training code in this repo in the files `train.py`, `model.py`, `configurator.py` and `prepare.py`.
|