leaderboard-pr-bot commited on
Commit
11a8d80
1 Parent(s): 7a7b897

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +117 -0
README.md CHANGED
@@ -1,5 +1,108 @@
1
  ---
2
  license: other
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
 
5
  LLaMA-65B converted to work with Transformers/HuggingFace. This is under a special license, please see the LICENSE file for details.
@@ -164,3 +267,17 @@ Risks and harms of large language models include the generation of harmful, offe
164
  **Use cases**
165
  LLaMA is a foundational model, and as such, it should not be used for downstream applications without further investigation and mitigations of risks. These risks and potential fraught use cases include, but are not limited to: generation of misinformation and generation of harmful, biased or offensive content.
166
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
+ model-index:
4
+ - name: llama-65b-hf
5
+ results:
6
+ - task:
7
+ type: text-generation
8
+ name: Text Generation
9
+ dataset:
10
+ name: AI2 Reasoning Challenge (25-Shot)
11
+ type: ai2_arc
12
+ config: ARC-Challenge
13
+ split: test
14
+ args:
15
+ num_few_shot: 25
16
+ metrics:
17
+ - type: acc_norm
18
+ value: 63.31
19
+ name: normalized accuracy
20
+ source:
21
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Enoch/llama-65b-hf
22
+ name: Open LLM Leaderboard
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: HellaSwag (10-Shot)
28
+ type: hellaswag
29
+ split: validation
30
+ args:
31
+ num_few_shot: 10
32
+ metrics:
33
+ - type: acc_norm
34
+ value: 86.09
35
+ name: normalized accuracy
36
+ source:
37
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Enoch/llama-65b-hf
38
+ name: Open LLM Leaderboard
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: MMLU (5-Shot)
44
+ type: cais/mmlu
45
+ config: all
46
+ split: test
47
+ args:
48
+ num_few_shot: 5
49
+ metrics:
50
+ - type: acc
51
+ value: 63.84
52
+ name: accuracy
53
+ source:
54
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Enoch/llama-65b-hf
55
+ name: Open LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: TruthfulQA (0-shot)
61
+ type: truthful_qa
62
+ config: multiple_choice
63
+ split: validation
64
+ args:
65
+ num_few_shot: 0
66
+ metrics:
67
+ - type: mc2
68
+ value: 43.43
69
+ source:
70
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Enoch/llama-65b-hf
71
+ name: Open LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: Winogrande (5-shot)
77
+ type: winogrande
78
+ config: winogrande_xl
79
+ split: validation
80
+ args:
81
+ num_few_shot: 5
82
+ metrics:
83
+ - type: acc
84
+ value: 82.48
85
+ name: accuracy
86
+ source:
87
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Enoch/llama-65b-hf
88
+ name: Open LLM Leaderboard
89
+ - task:
90
+ type: text-generation
91
+ name: Text Generation
92
+ dataset:
93
+ name: GSM8k (5-shot)
94
+ type: gsm8k
95
+ config: main
96
+ split: test
97
+ args:
98
+ num_few_shot: 5
99
+ metrics:
100
+ - type: acc
101
+ value: 44.81
102
+ name: accuracy
103
+ source:
104
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Enoch/llama-65b-hf
105
+ name: Open LLM Leaderboard
106
  ---
107
 
108
  LLaMA-65B converted to work with Transformers/HuggingFace. This is under a special license, please see the LICENSE file for details.
 
267
  **Use cases**
268
  LLaMA is a foundational model, and as such, it should not be used for downstream applications without further investigation and mitigations of risks. These risks and potential fraught use cases include, but are not limited to: generation of misinformation and generation of harmful, biased or offensive content.
269
 
270
+
271
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
272
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Enoch__llama-65b-hf)
273
+
274
+ | Metric |Value|
275
+ |---------------------------------|----:|
276
+ |Avg. |63.99|
277
+ |AI2 Reasoning Challenge (25-Shot)|63.31|
278
+ |HellaSwag (10-Shot) |86.09|
279
+ |MMLU (5-Shot) |63.84|
280
+ |TruthfulQA (0-shot) |43.43|
281
+ |Winogrande (5-shot) |82.48|
282
+ |GSM8k (5-shot) |44.81|
283
+