llama-anon commited on
Commit
7b9bc16
1 Parent(s): 2ecdd9a

Update README.md

Browse files

Add HuggingFaceH4 benchmark metrics

Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -3,6 +3,7 @@ language:
3
  - en
4
  datasets:
5
  - garage-bAInd/OpenPlatypus
 
6
  ---
7
 
8
  # Platypus2-70B-instruct
@@ -15,11 +16,11 @@ Platypus-70B-instruct is a merge of [`garage-bAInd/Platypus2-70B`](https://huggi
15
 
16
  | Metric | Value |
17
  |-----------------------|-------|
18
- | MMLU (5-shot) | -- |
19
- | ARC (25-shot) | -- |
20
- | HellaSwag (10-shot) | -- |
21
- | TruthfulQA (0-shot) | -- |
22
- | Avg. | -- |
23
 
24
  We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
25
 
 
3
  - en
4
  datasets:
5
  - garage-bAInd/OpenPlatypus
6
+ license: agpl-3.0
7
  ---
8
 
9
  # Platypus2-70B-instruct
 
16
 
17
  | Metric | Value |
18
  |-----------------------|-------|
19
+ | MMLU (5-shot) | 70.48 |
20
+ | ARC (25-shot) | 71.84 |
21
+ | HellaSwag (10-shot) | 87.94 |
22
+ | TruthfulQA (0-shot) | 62.26 |
23
+ | Avg. | 73.13 |
24
 
25
  We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
26