More Benchmarks
#4
by
PSM272
- opened
Can you add more benchmarks like MATH, MMLU, HumanEval, etc.?
Thank you for your attention.
Our Marco-o1 is currently primarily focused on Open-ended questions, such as Machine Translation. We do not intent to use our model for the tasks such as code or math.
However, a portion of our training data includes math and code data, but it's not the main focus. Therefore, we might not test these metrics for the time being. But we believe they may increase to some extent( perhaps not significantly).
In that case, why have the only reported benchmark be MGSM, a math benchmark?