Unable to reproduce the score of gemma_2b at pass@1 in humaneval.
I want to reproduce the humaneval pass@1 score of gemma_2b. When I tested with the parameters below, I got a score of 0.11.
completion = model.generate(input_ids=inputs["input_ids"],
max_length=512,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id)
After modifying the parameters as shown below, I got 0.14.
completion = model.generate(input_ids=inputs["input_ids"],
max_length=512,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.95) # 增加 max_length 的值
I would like to know how to modify the parameters to achieve 0.22.
Hi, Surya from the Gemma team here. Sorry for the late response -- we haven't fully open sourced our internal evaluation harness and it's interesting that the numbers you find are lower... we'll look into it!
Hey Surya, I've also been unable to get 0.22, I get 0.11 with greedy decoding. Could you let us know at least the sampling parameters and/or prompts used? Thanks!
hey @suryabhupa any chance you have an update on this? We are also unable to replicate for humaneval and other benchmarks.