Raincleared commited on
Commit
2effe75
1 Parent(s): 2d98d70

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -14
README.md CHANGED
@@ -66,7 +66,9 @@ The 13B model is trained on 32 A100 GPUs. The learning rate (LR) is controlled b
66
  | 4 | \\(2e-2\\) | 15,000 | 125.83 |
67
  | 5 | \\(2e-2\\) | 16,000 | 134.22 |
68
 
69
- ### Evaluation Benckmarks
 
 
70
 
71
  - **Code Generation**: We compute the average pass@1 scores on HumanEval (0-shot) and MBPP (3-shot).
72
 
@@ -74,22 +76,10 @@ The 13B model is trained on 32 A100 GPUs. The learning rate (LR) is controlled b
74
 
75
  - **Reading Comprehension**: We compute the average 0-shot accuracies on BoolQ, 0-shot accuracy on LAMBADA and TyDi QA.
76
 
77
- - **Other Popular Benchmarks**: We report the average accuracies on GSM8K (8-shot), MMLU (5-shot), Big Bench Hard (BBH) (3-shot), and AGI-Eval (0-shot). Refer to Appendix~\ref{sec:eval-details} for more details.
78
 
79
  **Notes**: For PIQA, SIQA, HellaSwag, WinoGrande, COPA, BoolQ, LAMBADA, TyDi QA, and AGI-Eval, we obtain the predicted answers based on maximized perplexity. For GSM8K, MMLU, and BBH, the predicted answers are directly generated.
80
 
81
- ### Evaluation Results
82
-
83
- The evaluation results on the above benchmarks demonstrate the advantage of ProSparse, which is the only method achieving high sparsity and comparable performance to the original Swish-activated LLaMA2. Note that models under all settings are trained with the same number of tokens on the same mixed dataset. Our evaluation is based on the framework [UltraEval](https://github.com/OpenBMB/UltraEval). The evaluation details are listed as follows:
84
-
85
- - **Code Generation**: We compute the average pass@1 scores on HumanEval (0-shot) and MBPP (3-shot).
86
-
87
- - **Commonsense Reasoning**: We report the average 0-shot perplexity (PPL) on PIQA, SIQA, HellaSwag, WinoGrande, and COPA.
88
-
89
- - **Reading Comprehension**: We compute the average 0-shot PPL on BoolQ, 0-shot accuracy on LAMBADA and TyDi QA.
90
-
91
- - **Other Popular Benchmarks**: We report the average accuracies on GSM8K (8-shot), MMLU (5-shot), Big Bench Hard (BBH) (3-shot), and the average PPL on AGI-Eval (0-shot).
92
-
93
  | Setting | Average<br>Sparsity | Code<br>Generation | Commonsense<br>Reasoning | Reading<br>Comprehension | GSM8K | MMLU | BBH | AGI Eval | Average |
94
  | :-------------------: | :-----------------: | :----------------: | :----------------------: | :----------------------: | :---: | :---: | :---: | :---------: | :-----: |
95
  | Original-7B | - | 16.37 | 69.59 | 61.87 | 12.96 | 44.45 | 32.96 | 27.53 | 37.96 |
 
66
  | 4 | \\(2e-2\\) | 15,000 | 125.83 |
67
  | 5 | \\(2e-2\\) | 16,000 | 134.22 |
68
 
69
+ ### Evaluation Results
70
+
71
+ The evaluation results on the above benchmarks demonstrate the advantage of ProSparse, which is the only method achieving high sparsity and comparable performance to the original Swish-activated LLaMA2. Note that models under all settings are trained with the same number of tokens on the same mixed dataset. Our evaluation is based on the framework [UltraEval](https://github.com/OpenBMB/UltraEval). The evaluation details are listed as follows:
72
 
73
  - **Code Generation**: We compute the average pass@1 scores on HumanEval (0-shot) and MBPP (3-shot).
74
 
 
76
 
77
  - **Reading Comprehension**: We compute the average 0-shot accuracies on BoolQ, 0-shot accuracy on LAMBADA and TyDi QA.
78
 
79
+ - **Other Popular Benchmarks**: We report the average accuracies on GSM8K (8-shot), MMLU (5-shot), Big Bench Hard (BBH) (3-shot), and AGI-Eval (0-shot).
80
 
81
  **Notes**: For PIQA, SIQA, HellaSwag, WinoGrande, COPA, BoolQ, LAMBADA, TyDi QA, and AGI-Eval, we obtain the predicted answers based on maximized perplexity. For GSM8K, MMLU, and BBH, the predicted answers are directly generated.
82
 
 
 
 
 
 
 
 
 
 
 
 
 
83
  | Setting | Average<br>Sparsity | Code<br>Generation | Commonsense<br>Reasoning | Reading<br>Comprehension | GSM8K | MMLU | BBH | AGI Eval | Average |
84
  | :-------------------: | :-----------------: | :----------------: | :----------------------: | :----------------------: | :---: | :---: | :---: | :---------: | :-----: |
85
  | Original-7B | - | 16.37 | 69.59 | 61.87 | 12.96 | 44.45 | 32.96 | 27.53 | 37.96 |