loubnabnl HF staff commited on
Commit
8bcc93a
1 Parent(s): 2ede1f8
Files changed (1) hide show
  1. evaluation/intro.txt +1 -1
evaluation/intro.txt CHANGED
@@ -56,7 +56,7 @@ Results: {'pass@1': 0.0750, 'pass@10': 0.4473, 'pass@20': 0.5}
56
  ````
57
 
58
  If we take a closer look at the unit test results for each candidate solution in the three tasks, we find that only 3 passed the test for the second problem, and none did for the first problem. This means that we have 3 correct solutions among 40, which corresponds to our pass@1 value `3/40 = 0.075`. The scores pass@10 and pass@20 are higher, because the more samples we select from the candidate completions, the more likely we are to include the correct implementation. As
59
- for pass@20, it is `1/2=0.5`, since if we select all 20 candidates for each problem, the second problem get solved which gives 50% success rate. If you are curious about the candidate solutions that passed the tests, they all implemented this function:
60
 
61
  ```python
62
 
 
56
  ````
57
 
58
  If we take a closer look at the unit test results for each candidate solution in the three tasks, we find that only 3 passed the test for the second problem, and none did for the first problem. This means that we have 3 correct solutions among 40, which corresponds to our pass@1 value `3/40 = 0.075`. The scores pass@10 and pass@20 are higher, because the more samples we select from the candidate completions, the more likely we are to include the correct implementation. As
59
+ for pass@20, it is `1/2 = 0.5`, since if we select all 20 candidates for each problem, the second problem get solved which gives 50% success rate. If you are curious about the candidate solutions that passed the tests, they all implemented this function:
60
 
61
  ```python
62