Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1131

The system message and prompt format for TruthfulQA results for Vicuna 13B

#28

by hamidpalangi - opened May 27, 2023

Discussion

hamidpalangi

May 27, 2023

Hi,

Thanks for the great work putting this leaderboard together. Can you please share what are the user prompt formats and system message for reported results for TruthfulQA for Vicuna 13B?
We are observing different results for TruthfulQA than the ones reported in this leaderboard for Vicuna 13B. Can you please also share the generated outputs by Vicuna 13B for this dataset?

Thanks!
-Hamid

BazsiBazsi

May 28, 2023

It would be great if we could verify the results of the leaderboard with an all-in one script. Best would be if they could share their pipeline

kmfoda

Jun 1, 2023

+1. I'm strugglgint to get the same results for EleutherAI/gpt-j-6B on hellaswag. The difference is small but results aren't the same. Would be great if the pipeline and repo hosting the results (HuggingFaceH4/lmeh_evaluation) could be made public.

leoapolonio

Jun 2, 2023

What metric is even being used in the leaderboard? MC1? MC2?

I'm trying to cross-reference with results from https://paperswithcode.com/sota/question-answering-on-truthfulqa

leoapolonio

Jun 2, 2023

Actually, it's MC2. If this script is correct https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/blob/main/app.py

clefourrier

Open LLM Leaderboard org Jul 7, 2023

Hi! Relevant information has been added in the About of the Leaderboard!

clefourrier changed discussion status to closed Jul 7, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment