lgaalves commited on
Commit
6170e7f
1 Parent(s): 828aa10

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -19,13 +19,13 @@ language:
19
 
20
  | Metric | llama-2-13b-chat-platypus | garage-bAInd/Platypus2-13B| llama-2-13b-chat-hf (base) |
21
  |-----------------------|-------|-------|-------|
22
- | Avg. | -|61.35| 59.93 |
23
- | ARC (25-shot) | -|61.26| 59.04 |
24
- | HellaSwag (10-shot) | -|82.56| 81.94 |
25
- | MMLU (5-shot) | -|56.7| 54.64 |
26
- | TruthfulQA (0-shot) | -|44.86| 44.12 |
 
27
 
28
-
29
  We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
30
 
31
  ### Model Details
 
19
 
20
  | Metric | llama-2-13b-chat-platypus | garage-bAInd/Platypus2-13B| llama-2-13b-chat-hf (base) |
21
  |-----------------------|-------|-------|-------|
22
+ | Avg. | 58.8 |**61.35**| 59.93 |
23
+ | ARC (25-shot) | 53.84|**61.26**| 59.04 |
24
+ | HellaSwag (10-shot) | 80.67|**82.56**| 81.94 |
25
+ | MMLU (5-shot) | 54.44|**56.7**| 54.64 |
26
+ | TruthfulQA (0-shot) | **46.23**|44.86| 44.12 |
27
+
28
 
 
29
  We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
30
 
31
  ### Model Details