leaderboard-pr-bot commited on
Commit
d521bdb
1 Parent(s): 588fca7

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +130 -13
README.md CHANGED
@@ -1,17 +1,4 @@
1
  ---
2
- base_model:
3
- - BioMistral/BioMistral-7B
4
- - mistralai/Mistral-7B-Instruct-v0.1
5
- library_name: transformers
6
- tags:
7
- - mergekit
8
- - merge
9
- - dare
10
- - medical
11
- - biology
12
- license: apache-2.0
13
- datasets:
14
- - pubmed
15
  language:
16
  - en
17
  - fr
@@ -21,7 +8,123 @@ language:
21
  - pl
22
  - ro
23
  - de
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ---
26
  # BioMistral-7B-mistral7instruct-dare
27
 
@@ -147,3 +250,17 @@ Arxiv : [https://arxiv.org/abs/2402.10373](https://arxiv.org/abs/2402.10373)
147
 
148
  **CAUTION!** Both direct and downstream users need to be informed about the risks, biases, and constraints inherent in the model. While the model can produce natural language text, our exploration of its capabilities and limitations is just beginning. In fields such as medicine, comprehending these limitations is crucial. Hence, we strongly advise against deploying this model for natural language generation in production or for professional tasks in the realm of health and medicine.
149
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  language:
3
  - en
4
  - fr
 
8
  - pl
9
  - ro
10
  - de
11
+ license: apache-2.0
12
+ library_name: transformers
13
+ tags:
14
+ - mergekit
15
+ - merge
16
+ - dare
17
+ - medical
18
+ - biology
19
+ datasets:
20
+ - pubmed
21
+ base_model:
22
+ - BioMistral/BioMistral-7B
23
+ - mistralai/Mistral-7B-Instruct-v0.1
24
  pipeline_tag: text-generation
25
+ model-index:
26
+ - name: BioMistral-7B-DARE
27
+ results:
28
+ - task:
29
+ type: text-generation
30
+ name: Text Generation
31
+ dataset:
32
+ name: AI2 Reasoning Challenge (25-Shot)
33
+ type: ai2_arc
34
+ config: ARC-Challenge
35
+ split: test
36
+ args:
37
+ num_few_shot: 25
38
+ metrics:
39
+ - type: acc_norm
40
+ value: 58.28
41
+ name: normalized accuracy
42
+ source:
43
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BioMistral/BioMistral-7B-DARE
44
+ name: Open LLM Leaderboard
45
+ - task:
46
+ type: text-generation
47
+ name: Text Generation
48
+ dataset:
49
+ name: HellaSwag (10-Shot)
50
+ type: hellaswag
51
+ split: validation
52
+ args:
53
+ num_few_shot: 10
54
+ metrics:
55
+ - type: acc_norm
56
+ value: 79.87
57
+ name: normalized accuracy
58
+ source:
59
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BioMistral/BioMistral-7B-DARE
60
+ name: Open LLM Leaderboard
61
+ - task:
62
+ type: text-generation
63
+ name: Text Generation
64
+ dataset:
65
+ name: MMLU (5-Shot)
66
+ type: cais/mmlu
67
+ config: all
68
+ split: test
69
+ args:
70
+ num_few_shot: 5
71
+ metrics:
72
+ - type: acc
73
+ value: 57.34
74
+ name: accuracy
75
+ source:
76
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BioMistral/BioMistral-7B-DARE
77
+ name: Open LLM Leaderboard
78
+ - task:
79
+ type: text-generation
80
+ name: Text Generation
81
+ dataset:
82
+ name: TruthfulQA (0-shot)
83
+ type: truthful_qa
84
+ config: multiple_choice
85
+ split: validation
86
+ args:
87
+ num_few_shot: 0
88
+ metrics:
89
+ - type: mc2
90
+ value: 55.61
91
+ source:
92
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BioMistral/BioMistral-7B-DARE
93
+ name: Open LLM Leaderboard
94
+ - task:
95
+ type: text-generation
96
+ name: Text Generation
97
+ dataset:
98
+ name: Winogrande (5-shot)
99
+ type: winogrande
100
+ config: winogrande_xl
101
+ split: validation
102
+ args:
103
+ num_few_shot: 5
104
+ metrics:
105
+ - type: acc
106
+ value: 76.09
107
+ name: accuracy
108
+ source:
109
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BioMistral/BioMistral-7B-DARE
110
+ name: Open LLM Leaderboard
111
+ - task:
112
+ type: text-generation
113
+ name: Text Generation
114
+ dataset:
115
+ name: GSM8k (5-shot)
116
+ type: gsm8k
117
+ config: main
118
+ split: test
119
+ args:
120
+ num_few_shot: 5
121
+ metrics:
122
+ - type: acc
123
+ value: 15.01
124
+ name: accuracy
125
+ source:
126
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BioMistral/BioMistral-7B-DARE
127
+ name: Open LLM Leaderboard
128
  ---
129
  # BioMistral-7B-mistral7instruct-dare
130
 
 
250
 
251
  **CAUTION!** Both direct and downstream users need to be informed about the risks, biases, and constraints inherent in the model. While the model can produce natural language text, our exploration of its capabilities and limitations is just beginning. In fields such as medicine, comprehending these limitations is crucial. Hence, we strongly advise against deploying this model for natural language generation in production or for professional tasks in the realm of health and medicine.
252
 
253
+
254
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
255
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BioMistral__BioMistral-7B-DARE)
256
+
257
+ | Metric |Value|
258
+ |---------------------------------|----:|
259
+ |Avg. |57.03|
260
+ |AI2 Reasoning Challenge (25-Shot)|58.28|
261
+ |HellaSwag (10-Shot) |79.87|
262
+ |MMLU (5-Shot) |57.34|
263
+ |TruthfulQA (0-shot) |55.61|
264
+ |Winogrande (5-shot) |76.09|
265
+ |GSM8k (5-shot) |15.01|
266
+