victormiller commited on
Commit
400af6c
1 Parent(s): 4857ea8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -34,9 +34,18 @@ Evaluations include standard best practice benchmarks, medical, math, and coding
34
 
35
  <center><img src="k2_table_of_tables.png" alt="k2 big eval table"/></center>
36
 
37
-
38
  Detailed analysis can be found on the K2 Weights and Biases project [here](https://wandb.ai/llm360/K2?nw=29mu6l0zzqq)
39
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ## K2 Gallery
42
  The K2 gallery allows one to browse the output of various prompts on intermediate K2 checkpoints, which provides an intuitive understanding on how the model develops and improves over time. This is inspired by The Bloom Book.
 
34
 
35
  <center><img src="k2_table_of_tables.png" alt="k2 big eval table"/></center>
36
 
 
37
  Detailed analysis can be found on the K2 Weights and Biases project [here](https://wandb.ai/llm360/K2?nw=29mu6l0zzqq)
38
 
39
+ ## Open LLM Leaderboard
40
+ | Evaluation | Score | Raw Score |
41
+ | ----------- | ----------- | ----------- |
42
+ | IFEval | 22.52 | 23 |
43
+ | BBH | 28.22 | 50 |
44
+ | Math Lvl 5 | 2.04 | 2 |
45
+ | GPQA | 3.58 | 28 |
46
+ | MUSR | 8.55 | 40 |
47
+ | MMLU-PRO | 22.27 | 30 |
48
+ | Average | 14.53 | 35.17 |
49
 
50
  ## K2 Gallery
51
  The K2 gallery allows one to browse the output of various prompts on intermediate K2 checkpoints, which provides an intuitive understanding on how the model develops and improves over time. This is inspired by The Bloom Book.