Text Generation
Transformers
PyTorch
TensorBoard
Safetensors
bloom
Eval Results
text-generation-inference
Inference Endpoints

Correct HumanEval scores

#79
by Muennighoff - opened
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -1754,15 +1754,15 @@ model-index:
1754
  metrics:
1755
  - name: pass@1
1756
  type: pass@1
1757
- value: 0.15524390243902436
1758
  verified: false
1759
  - name: pass@10
1760
  type: pass@10
1761
- value: 0.3220367632383857
1762
  verified: false
1763
  - name: pass@100
1764
  type: pass@100
1765
- value: 0.5545431515723145
1766
  verified: false
1767
  ---
1768
 
@@ -2338,8 +2338,8 @@ See this repository for JSON files: https://github.com/bigscience-workshop/evalu
2338
  | wnli (Median of 6 prompts) | eng | acc ↑ | 0.57 | 0.563 |
2339
  | wsc (Median of 11 prompts) | eng | acc ↑ | 0.519 | 0.413 |
2340
  | humaneval | python | pass@1 ↑ | 0.155 | 0.0 |
2341
- | humaneval | python | pass@10 ↑ | 0.322 | 0.0 |
2342
- | humaneval | python | pass@100 ↑ | 0.555 | 0.003 |
2343
 
2344
 
2345
  **Train-time Evaluation:**
 
1754
  metrics:
1755
  - name: pass@1
1756
  type: pass@1
1757
+ value: 0.15542682926829265
1758
  verified: false
1759
  - name: pass@10
1760
  type: pass@10
1761
+ value: 0.3278356276947017
1762
  verified: false
1763
  - name: pass@100
1764
  type: pass@100
1765
+ value: 0.5719815685597749
1766
  verified: false
1767
  ---
1768
 
 
2338
  | wnli (Median of 6 prompts) | eng | acc ↑ | 0.57 | 0.563 |
2339
  | wsc (Median of 11 prompts) | eng | acc ↑ | 0.519 | 0.413 |
2340
  | humaneval | python | pass@1 ↑ | 0.155 | 0.0 |
2341
+ | humaneval | python | pass@10 ↑ | 0.328 | 0.0 |
2342
+ | humaneval | python | pass@100 ↑ | 0.572 | 0.003 |
2343
 
2344
 
2345
  **Train-time Evaluation:**