Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
[FLAG] Fredithefish/ReasonixPajama-3B-HF: Model trained on eval data
#236
by
pankajmathur
- opened
ReasonixPajama-3B-HF seems to have truthfulqa_mc, 55.42 in an outlier range, not only for all the comparative 3b models but also 7b, 13b, 34b in this range,
After reaching out to Author, we got the following confirmation about parts of ARC and TruthFulQA dataset being used, Please see the comments on thread and attached screenshot.
https://huggingface.co/Fredithefish/ReasonixPajama-3B-HF/discussions/1
Here is the screenshot from the LB
@clefourrier : let us know what should be the next steps for this model on LB.
pankajmathur
changed discussion title from
Suspiciously High truthfulqa_mc of Fredithefish/ReasonixPajama-3B-HF
to Suspiciously High TruthfulQA of Fredithefish/ReasonixPajama-3B-HF
Hi, perfect issue!
Flagging this model!
clefourrier
changed discussion status to
closed
clefourrier
changed discussion title from
Suspiciously High TruthfulQA of Fredithefish/ReasonixPajama-3B-HF
to [FLAG] Fredithefish/ReasonixPajama-3B-HF: Model trained on eval data