uukuguy commited on
Commit
5eaedcf
1 Parent(s): b19e60f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md CHANGED
@@ -16,3 +16,22 @@ weight_mask_rate: 0.85 / use_weight_rescale: True / mask_stratery: random / scal
16
  | teknium/CollectiveCognition-v1.1-Mistral-7B | 53.87 | 62.12 | 84.17 | 62.35 | 57.62 | 75.37 | 15.62 | 19.85 |
17
  | uukuguy/speechless-mistral-dolphin-orca-platypus-samantha-7b | 53.34 | 64.33 | 84.4 | 63.72 | 52.52 | 78.37 | 21.38 | 8.66 |
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  | teknium/CollectiveCognition-v1.1-Mistral-7B | 53.87 | 62.12 | 84.17 | 62.35 | 57.62 | 75.37 | 15.62 | 19.85 |
17
  | uukuguy/speechless-mistral-dolphin-orca-platypus-samantha-7b | 53.34 | 64.33 | 84.4 | 63.72 | 52.52 | 78.37 | 21.38 | 8.66 |
18
 
19
+
20
+ 2023.12.04
21
+
22
+ It seems that there are some issues with the calculation of the GSM8K and DROP metrics on the Open LLM Leaderboard. Currently, the DROP metric has been removed from the official website, while the calculation of GSM8K metric remains chaotic, with significant differences in values among various models. Therefore, I am temporarily using ARC, HellaSwag, MMLU, TruthfulQA, and Winogrande metrics to evaluate the performance of DARE.
23
+
24
+ | Model | Average| ARC | HellaSwag | MMLU| TruthfulQA | Winogrande |
25
+ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
26
+ | CollectiveCognition-v1.1-Mistral-7B | 68.326 | 62.12 | 84.17 | 62.35 | 57.62 | 75.37 |
27
+ | CollectiveCognition-v1.1-Mistral-7B-dare-0.85 | 66.676 | 61.01 | 84.31 | 64.34 | 44.87 | 78.85 |
28
+ | airoboros-m-7b-3.1.2 | 67.722 | 61.86 | 83.51 | 61.91 | 53.75 | 77.58 |
29
+ | airoboros-m-7b-3.1.2-dare-0.85 | 66.144 | 61.09 | 83.57 | 64.05 | 43.64 | 78.37 |
30
+ | SynthIA-7B-v1.3 | 67.688 | 62.12 | 83.45 | 62.65 | 51.37 | 78.85 |
31
+ | SynthIA-7B-v1.3-dare-0.85 | 66.340 | 61.01 | 83.50 | 64.49 | 43.77 | 78.93 |
32
+ | | | | | | | |
33
+ | [speechless-mistral-7b-dare-0.85](https://huggingface.co/uukuguy/speechless-mistral-7b-dare-0.85) (merge 6 DARE models)| 68.516 | 63.57 | 84.82 | 64.29 | 50.66 | 79.24 |
34
+
35
+ From the official website evaluation results, after deleting 85% of the incremental parameters, the overall indicators remain above 97.5% of the original performance indicators. Among them, ARC slightly decreases, TruthfulQA significantly decreases, MMLU significantly increases, and HellaSwagt and Winogrande slightly increase. The most significant impact is the significant decrease in TruthfulQA, while other indicators are relatively well maintained, with MMLU showing a noticeable increase.
36
+
37
+