The system message and prompt format for TruthfulQA results for Vicuna 13B

#28
by hamidpalangi - opened

Hi,

Thanks for the great work putting this leaderboard together. Can you please share what are the user prompt formats and system message for reported results for TruthfulQA for Vicuna 13B?
We are observing different results for TruthfulQA than the ones reported in this leaderboard for Vicuna 13B. Can you please also share the generated outputs by Vicuna 13B for this dataset?

Thanks!
-Hamid

It would be great if we could verify the results of the leaderboard with an all-in one script. Best would be if they could share their pipeline

+1. I'm strugglgint to get the same results for EleutherAI/gpt-j-6B on hellaswag. The difference is small but results aren't the same. Would be great if the pipeline and repo hosting the results (HuggingFaceH4/lmeh_evaluation) could be made public.

What metric is even being used in the leaderboard? MC1? MC2?
image.png

I'm trying to cross-reference with results from https://paperswithcode.com/sota/question-answering-on-truthfulqa

Hugging Face H4 org

Hi! Relevant information has been added in the About of the Leaderboard!

clefourrier changed discussion status to closed

Sign up or log in to comment