legolasyiu commited on
Commit
e8131ed
·
verified ·
1 Parent(s): 93d02a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -64,6 +64,8 @@ Debugged vibecoder dataset
64
  | Tasks | Version | Filter | n-shot | Metric | Vcoder-120B | gpt-oss-120 | DeepSeek-V3.2-Exp |
65
  |---------------------|---------|------------------|--------|------------|-------------|------------ |-------------------|
66
  | gsm8k (cot) | 3 | flexible-extract | 5 | exact_match ↑ | 0.9557 | 0.88 | - |
 
 
67
  - Benchmark used [The Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main)
68
 
69
  **Notes:**
 
64
  | Tasks | Version | Filter | n-shot | Metric | Vcoder-120B | gpt-oss-120 | DeepSeek-V3.2-Exp |
65
  |---------------------|---------|------------------|--------|------------|-------------|------------ |-------------------|
66
  | gsm8k (cot) | 3 | flexible-extract | 5 | exact_match ↑ | 0.9557 | 0.88 | - |
67
+ | AIME2025 | 3 | flexible-extract | 5 | exact_match ↑ | 0.98 | 0.98 | - |
68
+
69
  - Benchmark used [The Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main)
70
 
71
  **Notes:**