Spaces:
Runtime error
Runtime error
| Toolless (sorted by score) | |
| --------------------------------------------------------- | |
| Model Score P50 (s) P99 (s) | |
| --------------------------------------------------------- | |
| gemini-2.5-pro 8/20 16.32 61.77 | |
| gpt-5 5/20 33.78 144.80 | |
| glm-4.5-air 3/20 30.33 266.54 | |
| gpt-oss-120b 2/20 8.57 63.07 | |
| Qwen3-235B (thinking) 2/20 50.75 152.70 | |
| gemma3-4b 1/20 161.28 312.12 | |
| With tools (sorted by score) | |
| --------------------------------------------------------- | |
| Model Score P50 (s) P99 (s) | |
| --------------------------------------------------------- | |
| gpt-5 16/20 270.40 990.36 | |
| gemini-2.5-pro 12/20 37.22 134.73 | |
| gpt-oss-120b 11/20 12.81 33.87 | |
| glm-4.5-air 9/20 45.61 103.07 | |
| Qwen3-235B (thinking) 6/20 111.34 226.74 | |
| gemma3-4b 0/20 870.13 1900.00 | |