pansophic commited on
Commit
951ff2d
1 Parent(s): 5a5385d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -64,7 +64,6 @@ In AlpacaEval, Rocket 🦝 achieves a near 80% win rate, coupled with an average
64
 
65
 
66
  ## Other benchmarks
67
- Despite its impressive performance on MT-Bench and AlpacaEval benchmarks, the model experiences some challenges when evaluated on other benchmark tests.
68
 
69
  | Metric | Value |
70
  |-----------------------|---------------------------|
@@ -72,9 +71,9 @@ Despite its impressive performance on MT-Bench and AlpacaEval benchmarks, the mo
72
  | ARC (25-shot) | 50.51 |
73
  | HellaSwag (10-shot) | 73.91 |
74
  | MMLU (5-shot) | 61.07 |
75
- | TruthfulQA (0-shot) | 57.45 |
76
  | Winogrande (5-shot) | 63.22 |
77
- | GSM8K (5-shot) | 12.74 |
78
  | DROP (3-shot) | 9.66 |
79
 
80
 
 
64
 
65
 
66
  ## Other benchmarks
 
67
 
68
  | Metric | Value |
69
  |-----------------------|---------------------------|
 
71
  | ARC (25-shot) | 50.51 |
72
  | HellaSwag (10-shot) | 73.91 |
73
  | MMLU (5-shot) | 61.07 |
74
+ | TruthfulQA (mc2) (0-shot) | 54.38 |
75
  | Winogrande (5-shot) | 63.22 |
76
+ | GSM8K (5-shot) | 37.91 |
77
  | DROP (3-shot) | 9.66 |
78
 
79