pansophic commited on
Commit
a5cfcd4
1 Parent(s): ddf1caa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -10
README.md CHANGED
@@ -22,7 +22,7 @@ base_model: stabilityai/stablelm-3b-4e1t
22
 
23
 
24
  ## Performance
25
- Despite its compact dimensions, the model achieves outstanding scores in both MT-Bench [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks, surpassing the performance of considerably larger models.
26
 
27
  | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
28
  |-------------|-----|----|---------------|--------------|
@@ -63,18 +63,17 @@ In AlpacaEval, Rocket 🦝 achieves a near 80% win rate, coupled with an average
63
  | **Rocket** 🦝 | **79.75** | **1.42** | **1242** |
64
 
65
 
66
- ## Other benchmarks
67
 
68
  | Metric | Value |
69
  |-----------------------|---------------------------|
70
- | Average | 51.00 |
71
- | ARC (25-shot) | 50.51 |
72
- | HellaSwag (10-shot) | 76.45 |
73
- | MMLU (5-shot) | 45.51 |
74
- | TruthfulQA (0-shot) | 54.38 |
75
- | Winogrande (5-shot) | 67.8 |
76
- | GSM8K (5-shot) | 37.91 |
77
- | DROP (3-shot) | 24.49 |
78
 
79
 
80
  ## Intended uses & limitations
 
22
 
23
 
24
  ## Performance
25
+ Despite its compact dimensions, the model achieves outstanding scores in both [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks, surpassing the performance of considerably larger models.
26
 
27
  | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
28
  |-------------|-----|----|---------------|--------------|
 
63
  | **Rocket** 🦝 | **79.75** | **1.42** | **1242** |
64
 
65
 
66
+ ## Open LLM leaderboard
67
 
68
  | Metric | Value |
69
  |-----------------------|---------------------------|
70
+ | Average | 55.77 |
71
+ | ARC | 50.6 |
72
+ | HellaSwag | 76.69 |
73
+ | MMLU | 47.1 |
74
+ | TruthfulQA | 55.82 |
75
+ | Winogrande | 67.96 |
76
+ | GSM8K | 36.47 |
 
77
 
78
 
79
  ## Intended uses & limitations