Update README.md
Browse files
README.md
CHANGED
|
@@ -20,7 +20,7 @@ tags:
|
|
| 20 |
This is a tiny 20.75M parameter model showing how small models can perform on a little bunch of data.
|
| 21 |
|
| 22 |
## Training data
|
| 23 |
-
We used the first 100 million tokens of the 10BT Sample of Fineweb-Edu to train this model for 5000 steps for a final loss of
|
| 24 |
|
| 25 |
## Training specs
|
| 26 |
- Architecture: nanoGPT
|
|
|
|
| 20 |
This is a tiny 20.75M parameter model showing how small models can perform on a little bunch of data.
|
| 21 |
|
| 22 |
## Training data
|
| 23 |
+
We used the first 100 million tokens of the 10BT Sample of Fineweb-Edu to train this model for 5000 steps for a final loss of 4.2044 and a val loss of 4.1566.
|
| 24 |
|
| 25 |
## Training specs
|
| 26 |
- Architecture: nanoGPT
|