Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

917

acc or acc_norm?

#106

by paopao0226 - opened Jul 12, 2023

Discussion

paopao0226

Jul 12, 2023

Hello, when testing on arc dataset, there are two scores(acc and acc_norm), so which one does the leaderboard use?

lilloukas

Jul 12, 2023

In the auto_leaderboard files in load_results.py, everything but mmlu is using acc_norm.

# clone / pull the lmeh eval data
METRICS = ["acc_norm", "acc_norm", "acc_norm", "mc2"]
BENCHMARKS = ["arc_challenge", "hellaswag", "hendrycks", "truthfulqa_mc"]
BENCH_TO_NAME = {
    "arc_challenge": AutoEvalColumn.arc.name,
    "hellaswag": AutoEvalColumn.hellaswag.name,
    "hendrycks": AutoEvalColumn.mmlu.name,
    "truthfulqa_mc": AutoEvalColumn.truthfulqa.name,
}

paopao0226

Jul 12, 2023

@lilloukas Okkkk thanks! I think it's the result

SaylorTwift

Open LLM Leaderboard org Jul 13, 2023

Hello @paopao0226 , note that we just changed the metric we used for MMLU. The file now reads:

# clone / pull the lmeh eval data
METRICS = ["acc_norm", "acc_norm", "acc", "mc2"]
BENCHMARKS = ["arc:challenge", "hellaswag", "hendrycksTest", "truthfulqa:mc"]
BENCH_TO_NAME = {
    "arc:challenge": AutoEvalColumn.arc.name,
    "hellaswag": AutoEvalColumn.hellaswag.name,
    "hendrycksTest": AutoEvalColumn.mmlu.name,
    "truthfulqa:mc": AutoEvalColumn.truthfulqa.name,
}

paopao0226

Jul 13, 2023

@SaylorTwift Ok, so is this the latest version of what leaderboard using?

clefourrier

Open LLM Leaderboard org Jul 13, 2023

@paopao0226 Yes! :) And you can find the details in the About tab if you need

clefourrier changed discussion status to closed Jul 13, 2023

paopao0226

Jul 13, 2023

@clefourrier Thanks！the About tab becomes more informative.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment