le-leadboard/OpenLLMFrenchLeaderboard · Are instruction models evaluated with chat template?

2 days ago

•

In the Hugging Face Harness fork it is possible to specify --apply_chat_template and fewshot_as_multiturn options for instruction models (https://huggingface.co/docs/leaderboards/open_llm_leaderboard/about#reproducibility). That does not seem to be the case in this leaderboard according to the reproducibility instructions and when I try it (the flag exists in the code -- https://github.com/mohamedalhajjar/lm-evaluation-harness-multilingual/blob/64286c9b9a270f9b72a9c4ba05e014b8284108da/lm_eval/__main__.py#L172) I get the following error:

[rank6]: Traceback (most recent call last):
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank6]:     return _run_code(code, main_globals, None,
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/runpy.py", line 86, in _run_code
[rank6]:     exec(code, run_globals)
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/__main__.py", line 461, in <module>
[rank6]:     cli_evaluate()
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/__main__.py", line 382, in cli_evaluate
[rank6]:     results = evaluator.simple_evaluate(
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/utils.py", line 397, in _wrapper
[rank6]:     return fn(*args, **kwargs)
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/evaluator.py", line 288, in simple_evaluate
[rank6]:     evaluation_tracker.general_config_tracker.log_experiment_args(
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/loggers/evaluation_tracker.py", line 97, in log_experiment_args
[rank6]:     self.chat_template_sha = hash_string(chat_template) if chat_template else None
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/utils.py", line 36, in hash_string
[rank6]:     return hashlib.sha256(string.encode("utf-8")).hexdigest()
[rank6]: AttributeError: 'dict' object has no attribute 'encode'

malhajar

le-leadboard org about 6 hours ago

Thank you for raising this. Could you please add it to the github repo to be fixed? Thanks!

malhajar changed discussion status to closed about 6 hours ago