Spaces:
Runtime error
Runtime error
Update README.md
Browse files
README.md
CHANGED
@@ -28,7 +28,7 @@ The Code Eval metric calculates how good are predictions given a set of referenc
|
|
28 |
|
29 |
`references`: a list with a **function call** for each prediction. Each **function call** should output a string in stdout.
|
30 |
|
31 |
-
`
|
32 |
|
33 |
`k`: number of code candidates to consider in the evaluation. The default value is `[1, 10, 100]`.
|
34 |
|
@@ -39,10 +39,28 @@ The Code Eval metric calculates how good are predictions given a set of referenc
|
|
39 |
```python
|
40 |
from evaluate import load
|
41 |
code_eval_outputs = load("giulio98/code_eval_outputs")
|
42 |
-
references = ["if __name__ == "__main__":\n print(add(2, 3))"]
|
43 |
expected_outputs = ["5"]
|
44 |
candidates = [["def add(a,b):\n return a*b", "def add(a, b):\n return a+b"]]
|
45 |
-
pass_at_k, results = code_eval_outputs.compute(references=references, predictions=candidates,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
```
|
47 |
|
48 |
N.B.
|
|
|
28 |
|
29 |
`references`: a list with a **function call** for each prediction. Each **function call** should output a string in stdout.
|
30 |
|
31 |
+
`output`: a list of the expected output for each prediction.
|
32 |
|
33 |
`k`: number of code candidates to consider in the evaluation. The default value is `[1, 10, 100]`.
|
34 |
|
|
|
39 |
```python
|
40 |
from evaluate import load
|
41 |
code_eval_outputs = load("giulio98/code_eval_outputs")
|
42 |
+
references = ["if __name__ == \"__main__\":\n print(add(2, 3))"]
|
43 |
expected_outputs = ["5"]
|
44 |
candidates = [["def add(a,b):\n return a*b", "def add(a, b):\n return a+b"]]
|
45 |
+
pass_at_k, results = code_eval_outputs.compute(references=references, predictions=candidates, output=expected_outputs, k=[1, 2])
|
46 |
+
print(pass_at_k)
|
47 |
+
print(results)
|
48 |
+
```
|
49 |
+
|
50 |
+
Output:
|
51 |
+
```python
|
52 |
+
{'pass@1': 0.5, 'pass@2': 1.0}
|
53 |
+
defaultdict(list,
|
54 |
+
{0: [(0,
|
55 |
+
{'task_id': 0,
|
56 |
+
'passed': False,
|
57 |
+
'result': 'not passed',
|
58 |
+
'completion_id': 0}),
|
59 |
+
(1,
|
60 |
+
{'task_id': 0,
|
61 |
+
'passed': True,
|
62 |
+
'result': 'passed',
|
63 |
+
'completion_id': 1})]})
|
64 |
```
|
65 |
|
66 |
N.B.
|