Running 28 28 R1-distilled leaderboard ⚡ Generate a leaderboard for open-r1 models based on evaluation scores