Bad test results using lm-evaluation-harness

#68
by smart-liu - opened

I have run some tests on Gemma-2B and Gemma-7B using lm-evaluation-harness package, but got terrible results with Gemma-7B. All the code runs well in other models. Is there anything I should be noted?
results of Gemma-2B:
image.png

results of Gemma-7B:
image.png

My bad, I didn't update the lm-harness frame, so there's a mistake(missing [BOS] token). After update the frame, it works well and fit the results in the paper.

smart-liu changed discussion status to closed

@smart-liu Hey can you tell me what you mean by lm-harness frame update? You mean you updated the package?

@smart-liu Hey can you tell me what you mean by lm-harness frame update? You mean you updated the package?

yes, by updating the package into latest version, the problem was solved.

You should have seen a log entry in your lm-evaluation-harness evaluation that tells Gemma-2 model is sensitive to the bos token.

Sign up or log in to comment