Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1081

Llama-3.1 70B Math Hard doesn't match its dataset

#1041

by MaziyarPanahi - opened Dec 14, 2024

Discussion

MaziyarPanahi

Dec 14, 2024

Hi @alozowski
A quick question, in the UI Llama-3.1 70B Instrcut has 0.31 raw MATH Hard score:

But looking into the dataset, it's much much lower:

References: https://huggingface.co/datasets/open-llm-leaderboard/meta-llama__Meta-Llama-3.1-70B-Instruct-details

clefourrier

Open LLM Leaderboard org Dec 16, 2024

cc @SaylorTwift , when you updated the results, did you update the details too?
@MaziyarPanahi we had identified an issue in the MATH parsing (thanks to Meta), so all scores were updated a while back (I'd say 1 or 2 months ago now?) , maybe the details were not updated at the same time (since generations were the same and we just needed to recompute the answer extraction and average)

MaziyarPanahi

Dec 17, 2024

Thanks @clefourrier , it makes sense now. Appreciate the response.

SaylorTwift

Open LLM Leaderboard org Dec 18, 2024

Hi ! yes that's right, details were overlooked when updating the results, only th results files that you can find in the results repo were updated so that they can be diplsayed in the leaderboard. Sorry for the confusion !

clefourrier changed discussion status to closed Dec 18, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment