What do they mean by maj@1 ?

#44
by joserass - opened

I get that they might mean majority vote, but what does the 1 mean? the candidate response?

Google org
edited May 30

Hi @joserass , maj@1 metric score computed by greedily sampling 'once' per question while evaluating the Gemma 2b and 7b model performance. You can refer to this doc for more details on the used metrics for the model evaluation. Thank you

Would you be able to direct me to the source code of the evaluation pipeline used for the reported results?

I was unable to replicate the GSM8K benchmark 17.7%(2b-it) and 46.4%(7b-it) using 8-shot CoT with greedy decoding. For the 2b-it it was around 10% and 7b-it around 25%.

Used the an implementation based on the oficial repo methodology https://github.com/google-deepmind/gemma/blob/main/colabs/gsm8k_eval.ipynb and also the lm_eval framework from https://github.com/EleutherAI/lm-evaluation-harness.

Tried not only with maj@1 but also with sampling with different top_p and top_k variations.

Thanks @Renu11

Google org

We haven't yet open sourced our own internal evaluation harness; it's interesting that the numbers you find are lower -- we'll look into it!

Sign up or log in to comment