Reproduce the reported benchmark score using LM Harness
#7
by
SimonX
- opened
Is there anyone who can reproduce the reported benchmark score using LM Harness?
I am attempting to pull the model from HuggingFace and run the default settings of LM Harness (keeping the #shorts consistent with the reported score). However, I am receiving accuracies that show a significant discrepancy compared to the reported ones.