aaronmeoded-b10 commited on
Commit
6925d60
·
verified ·
1 Parent(s): 8b9da27

Set max_position_embeddings: 40000 for engine-builder workaround

Browse files

Workaround for Baseten engine-builder bug where max_seq_len in truss config.yaml is overridden by max_position_embeddings from config.json, causing OOM at runtime for Llama-3-70B SeqCls FP8. Setting max_position_embeddings to a value ≤ desired max_seq_len (45000) makes the override benign. See Slack thread w/ Dhruv Singal 2026-04-28 (Slingshot debug).

Files changed (1) hide show
  1. config.json +1 -1
config.json CHANGED
@@ -76,7 +76,7 @@
76
  "token activation 8": 8,
77
  "token activation 9": 9
78
  },
79
- "max_position_embeddings": 8192,
80
  "mlp_bias": false,
81
  "model_type": "llama",
82
  "num_attention_heads": 64,
 
76
  "token activation 8": 8,
77
  "token activation 9": 9
78
  },
79
+ "max_position_embeddings": 40000,
80
  "mlp_bias": false,
81
  "model_type": "llama",
82
  "num_attention_heads": 64,