open-llm-leaderboard/open_llm_leaderboard · [FLAG] Suspiciously High TruthfulQA for TigerResearch/tigerbot-7b-sft-v1

Aug 29, 2023

•

edited Aug 29, 2023

TigerResearch/tigerbot-7b-sft-v1 seems to have truthfulqa 58.18 which is in an outlier range, not only for all the comparative 7b models out there but suspiciously higher then any other also 13b, 34b and 70b in this range, please see the screenshot from LB:

I have reached out to Authors and opened the discussion asking for details , however I haven't got any response from them so far:
=> https://huggingface.co/TigerResearch/tigerbot-7b-sft-v1/discussions/1

@clefourrier : let us know what should be the next steps for this model on LB.

pankajmathur changed discussion status to closed Aug 29, 2023

pankajmathur changed discussion status to open Aug 29, 2023

clefourrier

Open LLM Leaderboard org Aug 29, 2023

Hi! Thank you for this issue, it's very complete!
Let's give them a week to investigate their secondary data, and if they have not then I'll flag their model.

clefourrier

Open LLM Leaderboard org Sep 5, 2023

It's been a week, since they don't seem to have actually examined their secondary data for contamination, I'll flag it and let users decide whether to use it or not.

clefourrier changed discussion title from Suspiciously High TruthfulQA for TigerResearch/tigerbot-7b-sft-v1 to [FLAG] Suspiciously High TruthfulQA for TigerResearch/tigerbot-7b-sft-v1 Sep 5, 2023

clefourrier changed discussion status to closed Sep 5, 2023

pankajmathur

Sep 5, 2023

Agreed, thanks for keeping tab on this one.