leaderboard-pr-bot commited on
Commit
ef6d4f2
1 Parent(s): 7f64046

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +119 -3
README.md CHANGED
@@ -1,9 +1,112 @@
1
  ---
2
  license: other
3
- datasets:
4
- - ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered
5
  tags:
6
  - uncensored
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
  Join our Discord! https://discord.gg/cognitivecomputations
@@ -20,4 +123,17 @@ You are responsible for anything you do with the model, just as you are responsi
20
 
21
  Publishing anything this model generates is the same as publishing it yourself.
22
 
23
- You are responsible for the content you publish, and you cannot blame the model any more than you can blame the knife, gun, lighter, or car for what you do with it.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
 
 
3
  tags:
4
  - uncensored
5
+ datasets:
6
+ - ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered
7
+ model-index:
8
+ - name: WizardLM-7B-Uncensored
9
+ results:
10
+ - task:
11
+ type: text-generation
12
+ name: Text Generation
13
+ dataset:
14
+ name: AI2 Reasoning Challenge (25-Shot)
15
+ type: ai2_arc
16
+ config: ARC-Challenge
17
+ split: test
18
+ args:
19
+ num_few_shot: 25
20
+ metrics:
21
+ - type: acc_norm
22
+ value: 47.87
23
+ name: normalized accuracy
24
+ source:
25
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ehartford/WizardLM-7B-Uncensored
26
+ name: Open LLM Leaderboard
27
+ - task:
28
+ type: text-generation
29
+ name: Text Generation
30
+ dataset:
31
+ name: HellaSwag (10-Shot)
32
+ type: hellaswag
33
+ split: validation
34
+ args:
35
+ num_few_shot: 10
36
+ metrics:
37
+ - type: acc_norm
38
+ value: 73.08
39
+ name: normalized accuracy
40
+ source:
41
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ehartford/WizardLM-7B-Uncensored
42
+ name: Open LLM Leaderboard
43
+ - task:
44
+ type: text-generation
45
+ name: Text Generation
46
+ dataset:
47
+ name: MMLU (5-Shot)
48
+ type: cais/mmlu
49
+ config: all
50
+ split: test
51
+ args:
52
+ num_few_shot: 5
53
+ metrics:
54
+ - type: acc
55
+ value: 35.42
56
+ name: accuracy
57
+ source:
58
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ehartford/WizardLM-7B-Uncensored
59
+ name: Open LLM Leaderboard
60
+ - task:
61
+ type: text-generation
62
+ name: Text Generation
63
+ dataset:
64
+ name: TruthfulQA (0-shot)
65
+ type: truthful_qa
66
+ config: multiple_choice
67
+ split: validation
68
+ args:
69
+ num_few_shot: 0
70
+ metrics:
71
+ - type: mc2
72
+ value: 41.49
73
+ source:
74
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ehartford/WizardLM-7B-Uncensored
75
+ name: Open LLM Leaderboard
76
+ - task:
77
+ type: text-generation
78
+ name: Text Generation
79
+ dataset:
80
+ name: Winogrande (5-shot)
81
+ type: winogrande
82
+ config: winogrande_xl
83
+ split: validation
84
+ args:
85
+ num_few_shot: 5
86
+ metrics:
87
+ - type: acc
88
+ value: 68.43
89
+ name: accuracy
90
+ source:
91
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ehartford/WizardLM-7B-Uncensored
92
+ name: Open LLM Leaderboard
93
+ - task:
94
+ type: text-generation
95
+ name: Text Generation
96
+ dataset:
97
+ name: GSM8k (5-shot)
98
+ type: gsm8k
99
+ config: main
100
+ split: test
101
+ args:
102
+ num_few_shot: 5
103
+ metrics:
104
+ - type: acc
105
+ value: 3.26
106
+ name: accuracy
107
+ source:
108
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ehartford/WizardLM-7B-Uncensored
109
+ name: Open LLM Leaderboard
110
  ---
111
 
112
  Join our Discord! https://discord.gg/cognitivecomputations
 
123
 
124
  Publishing anything this model generates is the same as publishing it yourself.
125
 
126
+ You are responsible for the content you publish, and you cannot blame the model any more than you can blame the knife, gun, lighter, or car for what you do with it.
127
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
128
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ehartford__WizardLM-7B-Uncensored)
129
+
130
+ | Metric |Value|
131
+ |---------------------------------|----:|
132
+ |Avg. |44.92|
133
+ |AI2 Reasoning Challenge (25-Shot)|47.87|
134
+ |HellaSwag (10-Shot) |73.08|
135
+ |MMLU (5-Shot) |35.42|
136
+ |TruthfulQA (0-shot) |41.49|
137
+ |Winogrande (5-shot) |68.43|
138
+ |GSM8k (5-shot) | 3.26|
139
+