Raincleared commited on
Commit
e030238
1 Parent(s): 617b900

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +9 -1
README.md CHANGED
@@ -78,7 +78,15 @@ The 7B model is trained on 8 A100 GPUs. The learning rate (LR) is controlled by
78
 
79
  ### Evaluation Results
80
 
81
- The evaluation results on the above benchmarks demonstrate the advantage of ProSparse, which is the only method achieving high sparsity and comparable performance to the original Swish-activated LLaMA2. Note that models under all settings are trained with the same number of tokens on the same mixed dataset. Refer to Section 4.2 of [paper](https://arxiv.org/pdf/2402.13516.pdf) for more details.
 
 
 
 
 
 
 
 
82
 
83
  | Setting | Average<br>Sparsity | Code<br>Generation | Commonsense<br>Reasoning | Reading<br>Comprehension | GSM8K | MMLU | BBH | AGI Eval | Average |
84
  | :-------------------: | :-----------------: | :----------------: | :----------------------: | :----------------------: | :---: | :---: | :---: | :---------: | :-----: |
 
78
 
79
  ### Evaluation Results
80
 
81
+ The evaluation results on the above benchmarks demonstrate the advantage of ProSparse, which is the only method achieving high sparsity and comparable performance to the original Swish-activated LLaMA2. Note that models under all settings are trained with the same number of tokens on the same mixed dataset. Our evaluation is based on the framework [UltraEval](https://github.com/OpenBMB/UltraEval). The evaluation details are listed as follows:
82
+
83
+ - **Code Generation**: We compute the average pass@1 scores on HumanEval (0-shot) and MBPP (3-shot).
84
+
85
+ - **Commonsense Reasoning**: We report the average 0-shot perplexity (PPL) on PIQA, SIQA, HellaSwag, WinoGrande, and COPA.
86
+
87
+ - **Reading Comprehension**: We compute the average 0-shot PPL on BoolQ, 0-shot accuracy on LAMBADA and TyDi QA.
88
+
89
+ - **Other Popular Benchmarks**: We report the average accuracies on GSM8K (8-shot), MMLU (5-shot), Big Bench Hard (BBH) (3-shot), and the average PPL on AGI-Eval (0-shot).
90
 
91
  | Setting | Average<br>Sparsity | Code<br>Generation | Commonsense<br>Reasoning | Reading<br>Comprehension | GSM8K | MMLU | BBH | AGI Eval | Average |
92
  | :-------------------: | :-----------------: | :----------------: | :----------------------: | :----------------------: | :---: | :---: | :---: | :---------: | :-----: |