open-llm-leaderboard/open_llm_leaderboard · Re-Evaluate models with old Llama 3 generation config

Jul 1

•

Hello,

some models like Neural-Daredevil still have the old generation file that specifies 120001 as the EOS token, (end_of_text), when it should be 120009 (<|eot_id|>). For Llama 3 Instruct, this is set correctly (see here: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/generation_config.json#L3)

The old generation_config.json models like Neural Daredevil use basically leads to the model being incapable stopping itself during evaluation which results in unexpectedly low scores:

Here's an example for IFEval.

For models like Neural-Daredevil-abliterated the generation_config.json has to be replaced by the one I've linked above for proper evaluation. NDD has received special attention by me because I really like it, so I have opened a PR that fixes this (https://huggingface.co/mlabonne/NeuralDaredevil-8B-abliterated/discussions/8/files) but there might be more L3 models out there with the old generation file.

clefourrier

Open LLM Leaderboard org Jul 2

Hi @Dampfinchen ,
Once this model is fixed with the new token management, feel free to resubmit it (and select the new commit) and they'll get re-evaluated.
However, it would be good if people could be careful with their submissions as it's costly to re-run badly submitted models.

clefourrier changed discussion status to closed Jul 2

Dampfinchen

Jul 2

Hello @clefourrier

mlabonne/NeuralDaredevil-8B-abliterated

The model has been fixed. Would you be so kind to flush the old test result so I can resubmit it? As I'm not the model creator, I cannot create a new commit.

Thank you!

clefourrier

Open LLM Leaderboard org Jul 2

If it's been merged, you can simply take the hash of the merge commit and submit with it.
(We don't delete previous run results.)

mlabonne

Jul 2

Good to know, thanks @clefourrier and @Dampfinchen