Feature Extraction
Transformers
Safetensors
English
bamboo
custom_code
yixinsong commited on
Commit
3a063e9
1 Parent(s): 99bede2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -76,10 +76,10 @@ Our evaluation is based on the framework lm-evaluation-harness and opencompass.
76
  - Huggingface LLM Leaderboard tasks.
77
  - Other Popular Benchmarks: We report the average accuracies on Big Bench Hard (BBH) (3-shot), HumanEval.
78
 
79
- | | MMLU | Winogrande | TruthfulQA | Hellaswag | GSM8K | Arc-C | HumanEval | BBH | Average |
80
  | ------- | ------ | ---------- | ---------- | --------- | ------ | ------ | --------- | ---- | ------- |
81
- | Ours | 0.6389 | 0.7593 | 0.4406 | 0.8217 | 0.5315 | 0.6195 | 0.256 | | |
82
- | Mistral | 0.6265 | 0.7924 | 0.4262 | 0.8332 | 0.4018 | 0.6143 | 0.2621 | | |
83
 
84
  ## Inference Speed Evaluation Results
85
 
 
76
  - Huggingface LLM Leaderboard tasks.
77
  - Other Popular Benchmarks: We report the average accuracies on Big Bench Hard (BBH) (3-shot), HumanEval.
78
 
79
+ | | Average | MMLU | Winogrande | TruthfulQA | Hellaswag | GSM8K | Arc-C | HumanEval | BBH |
80
  | ------- | ------ | ---------- | ---------- | --------- | ------ | ------ | --------- | ---- | ------- |
81
+ | Bamboo | **57.1** | 63.89 | 76.16 | 44.06 | 82.17 | 52.84 | 62.20 | 25.6 | 50.35 |
82
+ | Mistral-v0.1 | **56.5** | 62.65 | 79.24 | 42.62 | 83.32 | 40.18 | 61.43 | 26.21 | 56.35 |
83
 
84
  ## Inference Speed Evaluation Results
85