Request on prompts for the evaluation results.

#5
by sh0416 - opened

I reproduced the result for MBPP+ and the reproduced score is 0.4385, which is different from the reported score (0.474).

I think the difference comes from the prompt that I've used for my experiment.

My prompt is as follow.

Problem: Write a function to find the shared elements from the given two lists.
Test:
assert set(similar_elements((3, 4, 5, 6),(5, 7, 4, 10))) == set((4, 5))
assert set(similar_elements((1, 2, 3, 4),(5, 4, 3, 7))) == set((3, 4))
assert set(similar_elements((11, 12, 14, 13),(17, 15, 14, 13))) == set((13, 14))
Implementation:
```python

What is the prompt for the evaluation?
Thank you

Sign up or log in to comment