Correct maximum positional embeddings
Browse filesThe model appears to have been trained with context window = 512, not 2048 as claimed here. This can be seen by looking at the average loss by sequence position on the GPT4 tiny stories dataset (packed into inputs of length 2048):
![image.png](https://cdn-uploads.huggingface.co/production/uploads/65b0cb8770773c0ab8fde1e0/qXnk9-RtXGrXlUlkZCxl3.png)
It would be great to get this changed (for all tinystories models), as the current config is misleading.
- config.json +1 -1
config.json
CHANGED
@@ -28,7 +28,7 @@
|
|
28 |
"initializer_range": 0.02,
|
29 |
"intermediate_size": null,
|
30 |
"layer_norm_epsilon": 1e-05,
|
31 |
-
"max_position_embeddings":
|
32 |
"model_type": "gpt_neo",
|
33 |
"num_heads": 16,
|
34 |
"num_layers": 4,
|
|
|
28 |
"initializer_range": 0.02,
|
29 |
"intermediate_size": null,
|
30 |
"layer_norm_epsilon": 1e-05,
|
31 |
+
"max_position_embeddings": 512,
|
32 |
"model_type": "gpt_neo",
|
33 |
"num_heads": 16,
|
34 |
"num_layers": 4,
|