Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1005

[FLAG] Aspik101/trurl-2-13b-pl-instruct_unload

#213

by PaulMartrenchar - opened Aug 22, 2023

Discussion

PaulMartrenchar

Aug 22, 2023

Same a Voicelab/trurl-2-13b that was flagged (https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/202), the MMLU score is way too high for a 13B model.

Can it be flagged?

clefourrier

Open LLM Leaderboard org Aug 22, 2023

•

edited Aug 23, 2023

Hi!
Good catch! Since it contains trurl-13b in the name, it's likely it's used the above model as a base, so I'm flagging it for the moment.
However, in the sake of fairness, could you open an issue on their model repo to ask what they trained on/used as base?

gardner

Aug 25, 2023

The model file sizes seem consistent with other 13b models. Can users rewrite history by force-pushing to model repos?

timje

Aug 25, 2023

Further clarification for anyone (like me) who missed the Voicelab discussion, the trurl-2-13b model's training included much of the MMLU test, so of course it scores exceedingly well on the test for a 13b model. The Voicelab team is re-training without the MMLU dataset but doesn't expect much difference from base llama-2-13b; their focus is on Polish knowledge.

clefourrier changed discussion status to closed Aug 29, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment