Commit
a49cb4a
1 Parent(s): 39bf6ae

Adding Evaluation Results (#4)

Browse files

- Adding Evaluation Results (e400f6a516430b23c269dc69287ba751314cb320)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +117 -1
README.md CHANGED
@@ -1,5 +1,108 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
 
5
  # Model Card for Mistral-7B-Instruct-v0.3
@@ -138,4 +241,17 @@ make the model finely respect guardrails, allowing for deployment in environment
138
 
139
  ## The Mistral AI Team
140
 
141
- Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Jean-Malo Delignon, Jia Li, Justus Murke, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Nicolas Schuhl, Patrick von Platen, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, William El Sayed, William Marshall
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ model-index:
4
+ - name: Mistral-7B-Instruct-v0.3
5
+ results:
6
+ - task:
7
+ type: text-generation
8
+ name: Text Generation
9
+ dataset:
10
+ name: AI2 Reasoning Challenge (25-Shot)
11
+ type: ai2_arc
12
+ config: ARC-Challenge
13
+ split: test
14
+ args:
15
+ num_few_shot: 25
16
+ metrics:
17
+ - type: acc_norm
18
+ value: 63.91
19
+ name: normalized accuracy
20
+ source:
21
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Mistral-7B-Instruct-v0.3
22
+ name: Open LLM Leaderboard
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: HellaSwag (10-Shot)
28
+ type: hellaswag
29
+ split: validation
30
+ args:
31
+ num_few_shot: 10
32
+ metrics:
33
+ - type: acc_norm
34
+ value: 84.82
35
+ name: normalized accuracy
36
+ source:
37
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Mistral-7B-Instruct-v0.3
38
+ name: Open LLM Leaderboard
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: MMLU (5-Shot)
44
+ type: cais/mmlu
45
+ config: all
46
+ split: test
47
+ args:
48
+ num_few_shot: 5
49
+ metrics:
50
+ - type: acc
51
+ value: 62.58
52
+ name: accuracy
53
+ source:
54
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Mistral-7B-Instruct-v0.3
55
+ name: Open LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: TruthfulQA (0-shot)
61
+ type: truthful_qa
62
+ config: multiple_choice
63
+ split: validation
64
+ args:
65
+ num_few_shot: 0
66
+ metrics:
67
+ - type: mc2
68
+ value: 59.45
69
+ source:
70
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Mistral-7B-Instruct-v0.3
71
+ name: Open LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: Winogrande (5-shot)
77
+ type: winogrande
78
+ config: winogrande_xl
79
+ split: validation
80
+ args:
81
+ num_few_shot: 5
82
+ metrics:
83
+ - type: acc
84
+ value: 78.37
85
+ name: accuracy
86
+ source:
87
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Mistral-7B-Instruct-v0.3
88
+ name: Open LLM Leaderboard
89
+ - task:
90
+ type: text-generation
91
+ name: Text Generation
92
+ dataset:
93
+ name: GSM8k (5-shot)
94
+ type: gsm8k
95
+ config: main
96
+ split: test
97
+ args:
98
+ num_few_shot: 5
99
+ metrics:
100
+ - type: acc
101
+ value: 42.15
102
+ name: accuracy
103
+ source:
104
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Mistral-7B-Instruct-v0.3
105
+ name: Open LLM Leaderboard
106
  ---
107
 
108
  # Model Card for Mistral-7B-Instruct-v0.3
 
241
 
242
  ## The Mistral AI Team
243
 
244
+ Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Jean-Malo Delignon, Jia Li, Justus Murke, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Nicolas Schuhl, Patrick von Platen, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, William El Sayed, William Marshall
245
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
246
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__Mistral-7B-Instruct-v0.3)
247
+
248
+ | Metric |Value|
249
+ |---------------------------------|----:|
250
+ |Avg. |65.21|
251
+ |AI2 Reasoning Challenge (25-Shot)|63.91|
252
+ |HellaSwag (10-Shot) |84.82|
253
+ |MMLU (5-Shot) |62.58|
254
+ |TruthfulQA (0-shot) |59.45|
255
+ |Winogrande (5-shot) |78.37|
256
+ |GSM8k (5-shot) |42.15|
257
+