Add SkillsBench v1.1 evaluation result

#43
by bingran-you - opened

Summary

Adds MiniMax M2.7's SkillsBench v1.1 with-skills evaluation result to Hugging Face eval results.

This is a metadata-only PR adding .eval_results/skillsbench.yaml. It does not change model files, code, dependencies, or runtime behavior.

Result

  • Benchmark: SkillsBench v1.1
  • Benchmark dataset: https://huggingface.co/datasets/benchflow/skillsbench
  • Task id: skillsbench_v1_1
  • Value: 34.9
  • Mode: with-skills
  • Harness: BenchFlow
  • Agent: OpenHands
  • Coverage: 87 tasks x 3 trials, full 261/261 selected trials
  • Recomputed date: 2026-06-11

Source

Official export:
https://huggingface.co/datasets/benchflow/skillsbench-leaderboard/raw/main/leaderboard/skillsbench/v1.1/official.json

Trajectory/result archive:
https://huggingface.co/datasets/benchflow/skillsbench-leaderboard

Notes

SkillsBench has paired with-skills and without-skills scores. This PR submits only the with-skills score because the current Hugging Face benchmark dataset defines skillsbench_v1_1 as the default leaderboard task. A no-skills leaderboard can be added separately if/when the benchmark dataset adds a second task id.

bingran-you changed pull request title from Draft: Add SkillsBench v1.1 evaluation result to Add SkillsBench v1.1 evaluation result
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment