Are instruction models evaluated with chat template?

#1
by alexrs - opened

In the Hugging Face Harness fork it is possible to specify --apply_chat_template and fewshot_as_multiturn options for instruction models (https://huggingface.co/docs/leaderboards/open_llm_leaderboard/about#reproducibility). That does not seem to be the case in this leaderboard according to the reproducibility instructions and when I try it (the flag exists in the code -- https://github.com/mohamedalhajjar/lm-evaluation-harness-multilingual/blob/64286c9b9a270f9b72a9c4ba05e014b8284108da/lm_eval/__main__.py#L172) I get the following error:

[rank6]: Traceback (most recent call last):
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank6]:     return _run_code(code, main_globals, None,
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/runpy.py", line 86, in _run_code
[rank6]:     exec(code, run_globals)
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/__main__.py", line 461, in <module>
[rank6]:     cli_evaluate()
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/__main__.py", line 382, in cli_evaluate
[rank6]:     results = evaluator.simple_evaluate(
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/utils.py", line 397, in _wrapper
[rank6]:     return fn(*args, **kwargs)
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/evaluator.py", line 288, in simple_evaluate
[rank6]:     evaluation_tracker.general_config_tracker.log_experiment_args(
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/loggers/evaluation_tracker.py", line 97, in log_experiment_args
[rank6]:     self.chat_template_sha = hash_string(chat_template) if chat_template else None
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/utils.py", line 36, in hash_string
[rank6]:     return hashlib.sha256(string.encode("utf-8")).hexdigest()
[rank6]: AttributeError: 'dict' object has no attribute 'encode'
le-leadboard org

Thank you for raising this. Could you please add it to the github repo to be fixed? Thanks!

malhajar changed discussion status to closed

Sign up or log in to comment