Update README.md
Browse files
README.md
CHANGED
@@ -76,10 +76,10 @@ Our evaluation is based on the framework lm-evaluation-harness and opencompass.
|
|
76 |
- Huggingface LLM Leaderboard tasks.
|
77 |
- Other Popular Benchmarks: We report the average accuracies on Big Bench Hard (BBH) (3-shot), HumanEval.
|
78 |
|
79 |
-
|
|
80 |
| ------- | ------ | ---------- | ---------- | --------- | ------ | ------ | --------- | ---- | ------- |
|
81 |
-
|
|
82 |
-
| Mistral |
|
83 |
|
84 |
## Inference Speed Evaluation Results
|
85 |
|
|
|
76 |
- Huggingface LLM Leaderboard tasks.
|
77 |
- Other Popular Benchmarks: We report the average accuracies on Big Bench Hard (BBH) (3-shot), HumanEval.
|
78 |
|
79 |
+
| | Average | MMLU | Winogrande | TruthfulQA | Hellaswag | GSM8K | Arc-C | HumanEval | BBH |
|
80 |
| ------- | ------ | ---------- | ---------- | --------- | ------ | ------ | --------- | ---- | ------- |
|
81 |
+
| Bamboo | **57.1** | 63.89 | 76.16 | 44.06 | 82.17 | 52.84 | 62.20 | 25.6 | 50.35 |
|
82 |
+
| Mistral-v0.1 | **56.5** | 62.65 | 79.24 | 42.62 | 83.32 | 40.18 | 61.43 | 26.21 | 56.35 |
|
83 |
|
84 |
## Inference Speed Evaluation Results
|
85 |
|