Spaces:
Runtime error
Runtime error
gabeorlanski
commited on
Fix
Browse files
README.md
CHANGED
@@ -60,9 +60,9 @@ metrics, results = metric.compute(
|
|
60 |
|
61 |
The `bc_eval` metric outputs two things:
|
62 |
|
63 |
-
|
64 |
|
65 |
-
|
66 |
|
67 |
#### Values from Popular Papers
|
68 |
[PaLM-2](https://arxiv.org/pdf/2305.10403.pdf) Performance on BC-HumanEval (`pass@1` with greedy decoding):
|
@@ -87,7 +87,7 @@ The `bc_eval` metric outputs two things:
|
|
87 |
Full example with inputs that fail tests, time out, have an error, and pass.
|
88 |
|
89 |
#### Passing Example
|
90 |
-
```
|
91 |
import evaluate
|
92 |
from datasets import load_dataset
|
93 |
import os
|
|
|
60 |
|
61 |
The `bc_eval` metric outputs two things:
|
62 |
|
63 |
+
`metrics`: a dictionary with the pass rates for each k value defined in the arguments and the mean percent of tests passed per question. The keys are formatted as `{LANGUAGE NAME}/{METRIC NAME}`
|
64 |
|
65 |
+
`results`: a list of dictionaries with the results from each individual prediction.
|
66 |
|
67 |
#### Values from Popular Papers
|
68 |
[PaLM-2](https://arxiv.org/pdf/2305.10403.pdf) Performance on BC-HumanEval (`pass@1` with greedy decoding):
|
|
|
87 |
Full example with inputs that fail tests, time out, have an error, and pass.
|
88 |
|
89 |
#### Passing Example
|
90 |
+
```Python
|
91 |
import evaluate
|
92 |
from datasets import load_dataset
|
93 |
import os
|