Reproduce the reported benchmark score using LM Harness

#7
by SimonX - opened

Is there anyone who can reproduce the reported benchmark score using LM Harness?
I am attempting to pull the model from HuggingFace and run the default settings of LM Harness (keeping the #shorts consistent with the reported score). However, I am receiving accuracies that show a significant discrepancy compared to the reported ones.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment