Zardos leaderboard-pr-bot commited on
Commit
bb0b5a6
1 Parent(s): 4e6b851

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (4fdb35f0a9a1088dcb6387bae0bf988e7f615670)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +118 -1
README.md CHANGED
@@ -1,8 +1,111 @@
1
  ---
2
  language:
3
  - en
4
- pipeline_tag: text-generation
5
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  ---
7
 
8
  # Model Yaml
@@ -39,3 +142,17 @@ Mistral 7B is a pretrained base model and therefore does not have any moderation
39
  ## The Mistral AI Team
40
 
41
  Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - en
 
4
  license: apache-2.0
5
+ pipeline_tag: text-generation
6
+ model-index:
7
+ - name: Kant-Test-0.1-Mistral-7B
8
+ results:
9
+ - task:
10
+ type: text-generation
11
+ name: Text Generation
12
+ dataset:
13
+ name: AI2 Reasoning Challenge (25-Shot)
14
+ type: ai2_arc
15
+ config: ARC-Challenge
16
+ split: test
17
+ args:
18
+ num_few_shot: 25
19
+ metrics:
20
+ - type: acc_norm
21
+ value: 62.37
22
+ name: normalized accuracy
23
+ source:
24
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Zardos/Kant-Test-0.1-Mistral-7B
25
+ name: Open LLM Leaderboard
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: HellaSwag (10-Shot)
31
+ type: hellaswag
32
+ split: validation
33
+ args:
34
+ num_few_shot: 10
35
+ metrics:
36
+ - type: acc_norm
37
+ value: 82.84
38
+ name: normalized accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Zardos/Kant-Test-0.1-Mistral-7B
41
+ name: Open LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: MMLU (5-Shot)
47
+ type: cais/mmlu
48
+ config: all
49
+ split: test
50
+ args:
51
+ num_few_shot: 5
52
+ metrics:
53
+ - type: acc
54
+ value: 63.38
55
+ name: accuracy
56
+ source:
57
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Zardos/Kant-Test-0.1-Mistral-7B
58
+ name: Open LLM Leaderboard
59
+ - task:
60
+ type: text-generation
61
+ name: Text Generation
62
+ dataset:
63
+ name: TruthfulQA (0-shot)
64
+ type: truthful_qa
65
+ config: multiple_choice
66
+ split: validation
67
+ args:
68
+ num_few_shot: 0
69
+ metrics:
70
+ - type: mc2
71
+ value: 49.62
72
+ source:
73
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Zardos/Kant-Test-0.1-Mistral-7B
74
+ name: Open LLM Leaderboard
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: Winogrande (5-shot)
80
+ type: winogrande
81
+ config: winogrande_xl
82
+ split: validation
83
+ args:
84
+ num_few_shot: 5
85
+ metrics:
86
+ - type: acc
87
+ value: 78.3
88
+ name: accuracy
89
+ source:
90
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Zardos/Kant-Test-0.1-Mistral-7B
91
+ name: Open LLM Leaderboard
92
+ - task:
93
+ type: text-generation
94
+ name: Text Generation
95
+ dataset:
96
+ name: GSM8k (5-shot)
97
+ type: gsm8k
98
+ config: main
99
+ split: test
100
+ args:
101
+ num_few_shot: 5
102
+ metrics:
103
+ - type: acc
104
+ value: 37.98
105
+ name: accuracy
106
+ source:
107
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Zardos/Kant-Test-0.1-Mistral-7B
108
+ name: Open LLM Leaderboard
109
  ---
110
 
111
  # Model Yaml
 
142
  ## The Mistral AI Team
143
 
144
  Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
145
+
146
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
147
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Zardos__Kant-Test-0.1-Mistral-7B)
148
+
149
+ | Metric |Value|
150
+ |---------------------------------|----:|
151
+ |Avg. |62.42|
152
+ |AI2 Reasoning Challenge (25-Shot)|62.37|
153
+ |HellaSwag (10-Shot) |82.84|
154
+ |MMLU (5-Shot) |63.38|
155
+ |TruthfulQA (0-shot) |49.62|
156
+ |Winogrande (5-shot) |78.30|
157
+ |GSM8k (5-shot) |37.98|
158
+