Adding Evaluation Results

#1
Files changed (1) hide show
  1. README.md +110 -2
README.md CHANGED
@@ -1,13 +1,121 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
 
5
  base_model:
6
  - Dans-DiscountModels/mistral-7b-v0.3-ChatML
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
  This model is an early release of an upcoming model for testing purposes. The format is ChatML. If you use this model let me know how it goes.
9
 
10
  ### Training details:
11
  - 1x RTX 4080 + x RTX 3090
12
  - Rank 64 RSLoRA
13
- - 68 Hours runtime
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - en
4
+ license: apache-2.0
5
  base_model:
6
  - Dans-DiscountModels/mistral-7b-v0.3-ChatML
7
+ model-index:
8
+ - name: mistral-7b-test-merged
9
+ results:
10
+ - task:
11
+ type: text-generation
12
+ name: Text Generation
13
+ dataset:
14
+ name: IFEval (0-Shot)
15
+ type: HuggingFaceH4/ifeval
16
+ args:
17
+ num_few_shot: 0
18
+ metrics:
19
+ - type: inst_level_strict_acc and prompt_level_strict_acc
20
+ value: 66.24
21
+ name: strict accuracy
22
+ source:
23
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/mistral-7b-test-merged
24
+ name: Open LLM Leaderboard
25
+ - task:
26
+ type: text-generation
27
+ name: Text Generation
28
+ dataset:
29
+ name: BBH (3-Shot)
30
+ type: BBH
31
+ args:
32
+ num_few_shot: 3
33
+ metrics:
34
+ - type: acc_norm
35
+ value: 28.41
36
+ name: normalized accuracy
37
+ source:
38
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/mistral-7b-test-merged
39
+ name: Open LLM Leaderboard
40
+ - task:
41
+ type: text-generation
42
+ name: Text Generation
43
+ dataset:
44
+ name: MATH Lvl 5 (4-Shot)
45
+ type: hendrycks/competition_math
46
+ args:
47
+ num_few_shot: 4
48
+ metrics:
49
+ - type: exact_match
50
+ value: 4.15
51
+ name: exact match
52
+ source:
53
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/mistral-7b-test-merged
54
+ name: Open LLM Leaderboard
55
+ - task:
56
+ type: text-generation
57
+ name: Text Generation
58
+ dataset:
59
+ name: GPQA (0-shot)
60
+ type: Idavidrein/gpqa
61
+ args:
62
+ num_few_shot: 0
63
+ metrics:
64
+ - type: acc_norm
65
+ value: 7.38
66
+ name: acc_norm
67
+ source:
68
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/mistral-7b-test-merged
69
+ name: Open LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: MuSR (0-shot)
75
+ type: TAUR-Lab/MuSR
76
+ args:
77
+ num_few_shot: 0
78
+ metrics:
79
+ - type: acc_norm
80
+ value: 2.99
81
+ name: acc_norm
82
+ source:
83
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/mistral-7b-test-merged
84
+ name: Open LLM Leaderboard
85
+ - task:
86
+ type: text-generation
87
+ name: Text Generation
88
+ dataset:
89
+ name: MMLU-PRO (5-shot)
90
+ type: TIGER-Lab/MMLU-Pro
91
+ config: main
92
+ split: test
93
+ args:
94
+ num_few_shot: 5
95
+ metrics:
96
+ - type: acc
97
+ value: 21.46
98
+ name: accuracy
99
+ source:
100
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/mistral-7b-test-merged
101
+ name: Open LLM Leaderboard
102
  ---
103
  This model is an early release of an upcoming model for testing purposes. The format is ChatML. If you use this model let me know how it goes.
104
 
105
  ### Training details:
106
  - 1x RTX 4080 + x RTX 3090
107
  - Rank 64 RSLoRA
108
+ - 68 Hours runtime
109
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
110
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Dans-DiscountModels__mistral-7b-test-merged)
111
+
112
+ | Metric |Value|
113
+ |-------------------|----:|
114
+ |Avg. |21.77|
115
+ |IFEval (0-Shot) |66.24|
116
+ |BBH (3-Shot) |28.41|
117
+ |MATH Lvl 5 (4-Shot)| 4.15|
118
+ |GPQA (0-shot) | 7.38|
119
+ |MuSR (0-shot) | 2.99|
120
+ |MMLU-PRO (5-shot) |21.46|
121
+