alexmarques commited on
Commit
81758d1
·
verified ·
1 Parent(s): 3188a5c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -147,7 +147,7 @@ The model generated a single answer for each prompt form Arena-Hard, and each an
147
  We report below the scores obtained in each judgement and the average.
148
 
149
  OpenLLM v1 and v2 evaluations were conducted using Neural Magic's fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct).
150
- This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-70B-Instruct-evals) and a few fixes to OpenLLM v2 tasks.
151
 
152
  HumanEval and HumanEval+ evaluations were conducted using Neural Magic's fork of the [EvalPlus](https://github.com/neuralmagic/evalplus) repository.
153
 
@@ -155,7 +155,6 @@ Detailed model outputs are available as HuggingFace datasets for [Arena-Hard](ht
155
 
156
  ### Accuracy
157
 
158
- #### Open LLM Leaderboard evaluation scores
159
  <table>
160
  <tr>
161
  <td><strong>Benchmark</strong>
 
147
  We report below the scores obtained in each judgement and the average.
148
 
149
  OpenLLM v1 and v2 evaluations were conducted using Neural Magic's fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct).
150
+ This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-405B-Instruct-evals) and a few fixes to OpenLLM v2 tasks.
151
 
152
  HumanEval and HumanEval+ evaluations were conducted using Neural Magic's fork of the [EvalPlus](https://github.com/neuralmagic/evalplus) repository.
153
 
 
155
 
156
  ### Accuracy
157
 
 
158
  <table>
159
  <tr>
160
  <td><strong>Benchmark</strong>