lgaalves commited on
Commit
ec0ae84
1 Parent(s): 07cba63

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -7
README.md CHANGED
@@ -14,13 +14,15 @@ pipeline_tag: text-generation
14
 
15
  ### Benchmark Metrics
16
 
17
- | Metric | Value |
18
- |-----------------------|-------|
19
- | MMLU (5-shot) | - |
20
- | ARC (25-shot) | - |
21
- | HellaSwag (10-shot) | - |
22
- | TruthfulQA (0-shot) | - |
23
- | Avg. | - |
 
 
24
 
25
  We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
26
 
 
14
 
15
  ### Benchmark Metrics
16
 
17
+
18
+ | Metric | llama-2-7b-hf_open-platypus | meta-llama/Llama-2-7b-hf (base) |
19
+ |-----------------------|-------|-------|
20
+ | Avg. | - | 54.32 |
21
+ | ARC (25-shot) | - | 53.07 |
22
+ | HellaSwag (10-shot) | - | 78.59 |
23
+ | MMLU (5-shot) | - | 46.87 |
24
+ | TruthfulQA (0-shot) | - | 38.76 |
25
+
26
 
27
  We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
28