Commit
·
60c4817
1
Parent(s):
3001796
feat: Use real token estimates from actual evaluation data
Browse filesUpdated token usage estimates based on analysis of real evaluation results
from kshitijthakkar/smoltrace-results-20251117_104845:
Old estimates (way too low):
- tool: 350 total tokens
- code: 700 total tokens
- both: 900 total tokens
New estimates (from real data):
- tool: 12,629 avg tokens (36x higher!)
- code: 17,202 avg tokens (25x higher!)
- both: 14,833 avg tokens
Input/output split: 60/40 based on typical agent patterns
(large context with system prompts, tool outputs, reasoning chains)
This dramatically improves cost estimation accuracy, especially for API models
where token costs dominate.
- mcp_tools.py +16 -7
mcp_tools.py
CHANGED
|
@@ -344,14 +344,23 @@ async def estimate_cost(
|
|
| 344 |
else:
|
| 345 |
model_cost = {"input_cost_per_token": 0, "output_cost_per_token": 0} # Local model
|
| 346 |
|
| 347 |
-
# Estimate token usage per test
|
| 348 |
-
#
|
| 349 |
-
#
|
| 350 |
-
#
|
| 351 |
token_estimates = {
|
| 352 |
-
"tool": {
|
| 353 |
-
|
| 354 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 355 |
}
|
| 356 |
|
| 357 |
tokens_per_test = token_estimates[agent_type]
|
|
|
|
| 344 |
else:
|
| 345 |
model_cost = {"input_cost_per_token": 0, "output_cost_per_token": 0} # Local model
|
| 346 |
|
| 347 |
+
# Estimate token usage per test (based on real data from kshitijthakkar/smoltrace-results-20251117_104845)
|
| 348 |
+
# These are averages from actual agent evaluation runs
|
| 349 |
+
# Input/output split estimated at 60/40 based on typical agent patterns
|
| 350 |
+
# (agents have large context with system prompts, tool outputs, etc.)
|
| 351 |
token_estimates = {
|
| 352 |
+
"tool": {
|
| 353 |
+
"input": 7577, # 60% of 12,629 avg total tokens
|
| 354 |
+
"output": 5052 # 40% of 12,629 avg total tokens
|
| 355 |
+
},
|
| 356 |
+
"code": {
|
| 357 |
+
"input": 10321, # 60% of 17,202 avg total tokens
|
| 358 |
+
"output": 6881 # 40% of 17,202 avg total tokens
|
| 359 |
+
},
|
| 360 |
+
"both": {
|
| 361 |
+
"input": 8900, # Average of tool+code inputs
|
| 362 |
+
"output": 5933 # Average of tool+code outputs
|
| 363 |
+
}
|
| 364 |
}
|
| 365 |
|
| 366 |
tokens_per_test = token_estimates[agent_type]
|