Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1143

[FLAG] Fredithefish/ReasonixPajama-3B-HF: Model trained on eval data

#236

by pankajmathur - opened Aug 29, 2023

Discussion

pankajmathur

Aug 29, 2023

ReasonixPajama-3B-HF seems to have truthfulqa_mc, 55.42 in an outlier range, not only for all the comparative 3b models but also 7b, 13b, 34b in this range,
After reaching out to Author, we got the following confirmation about parts of ARC and TruthFulQA dataset being used, Please see the comments on thread and attached screenshot.

https://huggingface.co/Fredithefish/ReasonixPajama-3B-HF/discussions/1

Here is the screenshot from the LB

@clefourrier : let us know what should be the next steps for this model on LB.

pankajmathur changed discussion title from Suspiciously High truthfulqa_mc of Fredithefish/ReasonixPajama-3B-HF to Suspiciously High TruthfulQA of Fredithefish/ReasonixPajama-3B-HF Aug 29, 2023

clefourrier

Open LLM Leaderboard org Aug 29, 2023

Hi, perfect issue!
Flagging this model!

clefourrier changed discussion status to closed Aug 29, 2023

clefourrier changed discussion title from Suspiciously High TruthfulQA of Fredithefish/ReasonixPajama-3B-HF to [FLAG] Fredithefish/ReasonixPajama-3B-HF: Model trained on eval data Aug 29, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment