open-llm-leaderboard/open_llm_leaderboard · Is it possible to share commit hash or link of lm-evaluation-harness repository evaluating current leaderbaord?

Jul 17, 2023

I try to reproduce the leaderbaord in an offline environment.
Could you share the commit hash or link of lm-evaluation-harness repo evaluating current open llm leaderboard?
Thanks.

SaylorTwift

Open LLM Leaderboard org Jul 17, 2023

@win7785 Hi ! We are using this version of the harness: b281b0921b636bc36ad05c0b0b0763bd6dd43463.

clefourrier

Open LLM Leaderboard org Jul 17, 2023

It's also in the About section of the leaderboard :)

clefourrier changed discussion status to closed Jul 17, 2023

wonhosong

Jul 18, 2023

•

edited Jul 18, 2023

@SaylorTwift @clefourrier Thanks for your replying :)

I tested the LLaMA-7b model using the commit hash above and scored 0.3563 on the MMLU-5 shot.
The leaderboard showed a score of 0.383, which is now invisible.
Is there any updates to lm-eval-harness what HF team are using or open llm leaderboard?

Here are the scripts what i used:

git clone https://github.com/EleutherAI/lm-evaluation-harness.git
git checkout b281b0921b636bc36ad05c0b0b0763bd6dd43463
cd lm-evaluation-harness
python main.py --model=hf-causal --model_args="pretrained={path_of_llama-7b}" --tasks="hendrycks*" --num_fewshot=5 --batch_size=2 --no_cache

wonhosong changed discussion status to open Jul 18, 2023

clefourrier

Open LLM Leaderboard org Jul 18, 2023

•

edited Jul 18, 2023

Hi! We are investigating a small discrepancy for llama models, see the full discussion in this thread.

wonhosong

Jul 18, 2023

Thanks for sharing :)

wonhosong changed discussion status to closed Jul 18, 2023