Set max_position_embeddings: 40000 for engine-builder workaround
Browse filesWorkaround for Baseten engine-builder bug where max_seq_len in truss config.yaml is overridden by max_position_embeddings from config.json, causing OOM at runtime for Llama-3-70B SeqCls FP8. Setting max_position_embeddings to a value ≤ desired max_seq_len (45000) makes the override benign. See Slack thread w/ Dhruv Singal 2026-04-28 (Slingshot debug).
- config.json +1 -1
config.json
CHANGED
|
@@ -76,7 +76,7 @@
|
|
| 76 |
"token activation 8": 8,
|
| 77 |
"token activation 9": 9
|
| 78 |
},
|
| 79 |
-
"max_position_embeddings":
|
| 80 |
"mlp_bias": false,
|
| 81 |
"model_type": "llama",
|
| 82 |
"num_attention_heads": 64,
|
|
|
|
| 76 |
"token activation 8": 8,
|
| 77 |
"token activation 9": 9
|
| 78 |
},
|
| 79 |
+
"max_position_embeddings": 40000,
|
| 80 |
"mlp_bias": false,
|
| 81 |
"model_type": "llama",
|
| 82 |
"num_attention_heads": 64,
|