This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

NVIDIA org

Thanks for adding the evaluation results, but it looks like you might not have run the model correctly. Note that this model expects the code between special <llm-code>...</llm-code> tokens to be executed by Python interpreter and if that's not done, it will happily hallucinate outputs leading to very poor results (e.g. gsm8k score in this PR).

Please refer to https://github.com/Kipok/NeMo-Skills/blob/main/docs/inference.md for how to run inference with our models.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment