Update README.md
Browse files
README.md
CHANGED
|
@@ -64,6 +64,8 @@ Debugged vibecoder dataset
|
|
| 64 |
| Tasks | Version | Filter | n-shot | Metric | Vcoder-120B | gpt-oss-120 | DeepSeek-V3.2-Exp |
|
| 65 |
|---------------------|---------|------------------|--------|------------|-------------|------------ |-------------------|
|
| 66 |
| gsm8k (cot) | 3 | flexible-extract | 5 | exact_match ↑ | 0.9557 | 0.88 | - |
|
|
|
|
|
|
|
| 67 |
- Benchmark used [The Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main)
|
| 68 |
|
| 69 |
**Notes:**
|
|
|
|
| 64 |
| Tasks | Version | Filter | n-shot | Metric | Vcoder-120B | gpt-oss-120 | DeepSeek-V3.2-Exp |
|
| 65 |
|---------------------|---------|------------------|--------|------------|-------------|------------ |-------------------|
|
| 66 |
| gsm8k (cot) | 3 | flexible-extract | 5 | exact_match ↑ | 0.9557 | 0.88 | - |
|
| 67 |
+
| AIME2025 | 3 | flexible-extract | 5 | exact_match ↑ | 0.98 | 0.98 | - |
|
| 68 |
+
|
| 69 |
- Benchmark used [The Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main)
|
| 70 |
|
| 71 |
**Notes:**
|