Spaces:
Running
on
CPU Upgrade
The system message and prompt format for TruthfulQA results for Vicuna 13B
Hi,
Thanks for the great work putting this leaderboard together. Can you please share what are the user prompt formats and system message for reported results for TruthfulQA for Vicuna 13B?
We are observing different results for TruthfulQA than the ones reported in this leaderboard for Vicuna 13B. Can you please also share the generated outputs by Vicuna 13B for this dataset?
Thanks!
-Hamid
It would be great if we could verify the results of the leaderboard with an all-in one script. Best would be if they could share their pipeline
+1. I'm strugglgint to get the same results for EleutherAI/gpt-j-6B on hellaswag. The difference is small but results aren't the same. Would be great if the pipeline and repo hosting the results (HuggingFaceH4/lmeh_evaluation) could be made public.
What metric is even being used in the leaderboard? MC1? MC2?
I'm trying to cross-reference with results from https://paperswithcode.com/sota/question-answering-on-truthfulqa
Actually, it's MC2. If this script is correct https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/blob/main/app.py
Hi! Relevant information has been added in the About of the Leaderboard!