giulio98 commited on
Commit
9433903
1 Parent(s): 36cc098

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -3
README.md CHANGED
@@ -28,7 +28,7 @@ The Code Eval metric calculates how good are predictions given a set of referenc
28
 
29
  `references`: a list with a **function call** for each prediction. Each **function call** should output a string in stdout.
30
 
31
- `outputs`: a list of the expected output for each prediction.
32
 
33
  `k`: number of code candidates to consider in the evaluation. The default value is `[1, 10, 100]`.
34
 
@@ -39,10 +39,28 @@ The Code Eval metric calculates how good are predictions given a set of referenc
39
  ```python
40
  from evaluate import load
41
  code_eval_outputs = load("giulio98/code_eval_outputs")
42
- references = ["if __name__ == "__main__":\n print(add(2, 3))"]
43
  expected_outputs = ["5"]
44
  candidates = [["def add(a,b):\n return a*b", "def add(a, b):\n return a+b"]]
45
- pass_at_k, results = code_eval_outputs.compute(references=references, predictions=candidates, outputs=expected_outputs, k=[1, 2])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  ```
47
 
48
  N.B.
 
28
 
29
  `references`: a list with a **function call** for each prediction. Each **function call** should output a string in stdout.
30
 
31
+ `output`: a list of the expected output for each prediction.
32
 
33
  `k`: number of code candidates to consider in the evaluation. The default value is `[1, 10, 100]`.
34
 
 
39
  ```python
40
  from evaluate import load
41
  code_eval_outputs = load("giulio98/code_eval_outputs")
42
+ references = ["if __name__ == \"__main__\":\n print(add(2, 3))"]
43
  expected_outputs = ["5"]
44
  candidates = [["def add(a,b):\n return a*b", "def add(a, b):\n return a+b"]]
45
+ pass_at_k, results = code_eval_outputs.compute(references=references, predictions=candidates, output=expected_outputs, k=[1, 2])
46
+ print(pass_at_k)
47
+ print(results)
48
+ ```
49
+
50
+ Output:
51
+ ```python
52
+ {'pass@1': 0.5, 'pass@2': 1.0}
53
+ defaultdict(list,
54
+ {0: [(0,
55
+ {'task_id': 0,
56
+ 'passed': False,
57
+ 'result': 'not passed',
58
+ 'completion_id': 0}),
59
+ (1,
60
+ {'task_id': 0,
61
+ 'passed': True,
62
+ 'result': 'passed',
63
+ 'completion_id': 1})]})
64
  ```
65
 
66
  N.B.