Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

781

what kind of task does TruthfulQA eval? mc1? mc2? or both?

#95

by paopao0226 - opened Jul 4, 2023

Discussion

paopao0226

Jul 4, 2023

Hello and thanks for your leaderboard, here is the question that which task does TruthfulQA score based on? just mc1? just mc2? or both mc1 and mc2? if both, how to mix the scores of two different task. Thanks!

freejen

Jul 6, 2023

Hi, I would like to expand on the question by @paopao0226 . I would like to run some of the experiments on my hardware and am unsure about this and other details. Could you point us to the code that calls lm-evaluation-harness and produces the numbers in the leaderboard?
Thanks 😃

freejen

Jul 7, 2023

•

edited Jul 7, 2023

I think I found the answer. If you compare truthfulqa numbers from the leaderboard with the numbers reported in the lm-evaluation-harness repo for llama-7b and llama-13b, you'll see they correspond to zero-shot mc2 numbers.

However, I would still like to see the code used by Huggingface 😃