[FLAG] Fredithefish/ReasonixPajama-3B-HF: Model trained on eval data

#236
by pankajmathur - opened

ReasonixPajama-3B-HF seems to have truthfulqa_mc, 55.42 in an outlier range, not only for all the comparative 3b models but also 7b, 13b, 34b in this range,
After reaching out to Author, we got the following confirmation about parts of ARC and TruthFulQA dataset being used, Please see the comments on thread and attached screenshot.

https://huggingface.co/Fredithefish/ReasonixPajama-3B-HF/discussions/1

IMG_0817.png

Here is the screenshot from the LB

Screenshot 2023-08-26 at 12.58.07 AM.png

@clefourrier : let us know what should be the next steps for this model on LB.

pankajmathur changed discussion title from Suspiciously High truthfulqa_mc of Fredithefish/ReasonixPajama-3B-HF to Suspiciously High TruthfulQA of Fredithefish/ReasonixPajama-3B-HF
Open LLM Leaderboard org

Hi, perfect issue!
Flagging this model!

clefourrier changed discussion status to closed
clefourrier changed discussion title from Suspiciously High TruthfulQA of Fredithefish/ReasonixPajama-3B-HF to [FLAG] Fredithefish/ReasonixPajama-3B-HF: Model trained on eval data

Sign up or log in to comment