what kind of task does TruthfulQA eval? mc1? mc2? or both?

#95
by paopao0226 - opened

Hello and thanks for your leaderboard, here is the question that which task does TruthfulQA score based on? just mc1? just mc2? or both mc1 and mc2? if both, how to mix the scores of two different task. Thanks!

Hi, I would like to expand on the question by @paopao0226 . I would like to run some of the experiments on my hardware and am unsure about this and other details. Could you point us to the code that calls lm-evaluation-harness and produces the numbers in the leaderboard?
Thanks 😃

I think I found the answer. If you compare truthfulqa numbers from the leaderboard with the numbers reported in the lm-evaluation-harness repo for llama-7b and llama-13b, you'll see they correspond to zero-shot mc2 numbers.

However, I would still like to see the code used by Huggingface 😃

Open LLM Leaderboard org

Hi @paopao0226 @freejen !
We added the information on the About section of the Leaderboard :)

@clefourrier Thanks!

clefourrier changed discussion status to closed

@freejen hello! the info has added on the About section!

This comment has been hidden

Sign up or log in to comment