gabeorlanski commited on
Commit
9610edf
·
unverified ·
1 Parent(s): 506dd90
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -60,9 +60,9 @@ metrics, results = metric.compute(
60
 
61
  The `bc_eval` metric outputs two things:
62
 
63
- * `metrics`: a dictionary with the pass rates for each k value defined in the arguments and the mean percent of tests passed per question. The keys are formatted as `{LANGUAGE NAME}/{METRIC NAME}`
64
 
65
- * `results`: a list of dictionaries with the results from each individual prediction.
66
 
67
  #### Values from Popular Papers
68
  [PaLM-2](https://arxiv.org/pdf/2305.10403.pdf) Performance on BC-HumanEval (`pass@1` with greedy decoding):
@@ -87,7 +87,7 @@ The `bc_eval` metric outputs two things:
87
  Full example with inputs that fail tests, time out, have an error, and pass.
88
 
89
  #### Passing Example
90
- ```python
91
  import evaluate
92
  from datasets import load_dataset
93
  import os
 
60
 
61
  The `bc_eval` metric outputs two things:
62
 
63
+ `metrics`: a dictionary with the pass rates for each k value defined in the arguments and the mean percent of tests passed per question. The keys are formatted as `{LANGUAGE NAME}/{METRIC NAME}`
64
 
65
+ `results`: a list of dictionaries with the results from each individual prediction.
66
 
67
  #### Values from Popular Papers
68
  [PaLM-2](https://arxiv.org/pdf/2305.10403.pdf) Performance on BC-HumanEval (`pass@1` with greedy decoding):
 
87
  Full example with inputs that fail tests, time out, have an error, and pass.
88
 
89
  #### Passing Example
90
+ ```Python
91
  import evaluate
92
  from datasets import load_dataset
93
  import os