Set max_position_embeddings: 40000 for engine-builder workaround

Workaround for Baseten engine-builder bug where max_seq_len in truss config.yaml is overridden by max_position_embeddings from config.json, causing OOM at runtime for Llama-3-70B SeqCls FP8. Setting max_position_embeddings to a value ≤ desired max_seq_len (45000) makes the override benign. See Slack thread w/ Dhruv Singal 2026-04-28 (Slingshot debug).

Files changed (1) hide show

config.json +1 -1

config.json CHANGED Viewed

@@ -76,7 +76,7 @@
     "token activation 8": 8,
     "token activation 9": 9
   },
-  "max_position_embeddings": 8192,
   "mlp_bias": false,
   "model_type": "llama",
   "num_attention_heads": 64,

     "token activation 8": 8,
     "token activation 9": 9
   },
+  "max_position_embeddings": 40000,
   "mlp_bias": false,
   "model_type": "llama",
   "num_attention_heads": 64,