Raincleared commited on
Commit
9fe2969
1 Parent(s): 2effe75

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md CHANGED
@@ -96,6 +96,22 @@ The evaluation results on the above benchmarks demonstrate the advantage of ProS
96
 
97
  **Notes**: "Original" refers to the original Swish-activated LLaMA2 versions. ReluLLaMA-7B and ReluLLaMA-13B are available at [7B](https://huggingface.co/SparseLLM/ReluLLaMA-7B) and [13B](https://huggingface.co/SparseLLM/ReluLLaMA-13B) respectively. "ProSparse-7B\*" and "ProSparse-13B\*" denote the ProSparse versions without activation threshold shifting.
98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  ### Inference Acceleration Effects
100
 
101
  First, we utilize [PowerInfer](https://arxiv.org/pdf/2312.12456.pdf), a state-of-the-art acceleration framework leveraging activation sparsity. As its inference speed and accuracy heavily rely on the performance of activation predictors, we report the activation recall and predicted sparsity (i.e., two key metrics for evaluating the activation predictor) as well as the number of tokens generated per second by PowerInfer (with one A100 GPU and sufficient CPUs). The GGUF files and activation predictors for ProSparse-13B are available at [ProSparse-LLaMA-2-13B-GGUF](https://huggingface.co/PowerInfer/prosparse-llama-2-13b-gguf) ([duplicate](https://huggingface.co/SparseLLM/prosparse-llama-2-13b-gguf)) and [ProSparse-LLaMA-2-13B-Predictor](https://huggingface.co/PowerInfer/prosparse-llama-2-13b-predictor) ([duplicate](https://huggingface.co/SparseLLM/prosparse-llama-2-13b-predictor)) respectively.
 
96
 
97
  **Notes**: "Original" refers to the original Swish-activated LLaMA2 versions. ReluLLaMA-7B and ReluLLaMA-13B are available at [7B](https://huggingface.co/SparseLLM/ReluLLaMA-7B) and [13B](https://huggingface.co/SparseLLM/ReluLLaMA-13B) respectively. "ProSparse-7B\*" and "ProSparse-13B\*" denote the ProSparse versions without activation threshold shifting.
98
 
99
+ ### Evaluation Issues with LM-Eval
100
+
101
+ The above results can be replicated with [UltraEval](https://github.com/OpenBMB/UltraEval). Some abnormal results obtained with other popular frameworks such as [LM-Eval](https://github.com/EleutherAI/lm-evaluation-harness) are probably attributed to the absence of the cls token `<s>`, which is not added by default in LM-Eval. A quick temporary fix is shown in the following codes. Other differences in evaluation results may be caused by other reasons, including the few-shot settings, data pre-processing, and extra prompts.
102
+
103
+ ```python
104
+ # https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/models/huggingface.py#L945
105
+ for _, context_enc, continuation_enc in chunk:
106
+ # sanity check
107
+ assert len(context_enc) > 0
108
+ # Note: a trivial fix here
109
+ if context_enc[0] != 1:
110
+ context_enc = [1] + context_enc
111
+ assert len(continuation_enc) > 0
112
+ assert len(continuation_enc) <= self.max_length
113
+ ```
114
+
115
  ### Inference Acceleration Effects
116
 
117
  First, we utilize [PowerInfer](https://arxiv.org/pdf/2312.12456.pdf), a state-of-the-art acceleration framework leveraging activation sparsity. As its inference speed and accuracy heavily rely on the performance of activation predictors, we report the activation recall and predicted sparsity (i.e., two key metrics for evaluating the activation predictor) as well as the number of tokens generated per second by PowerInfer (with one A100 GPU and sufficient CPUs). The GGUF files and activation predictors for ProSparse-13B are available at [ProSparse-LLaMA-2-13B-GGUF](https://huggingface.co/PowerInfer/prosparse-llama-2-13b-gguf) ([duplicate](https://huggingface.co/SparseLLM/prosparse-llama-2-13b-gguf)) and [ProSparse-LLaMA-2-13B-Predictor](https://huggingface.co/PowerInfer/prosparse-llama-2-13b-predictor) ([duplicate](https://huggingface.co/SparseLLM/prosparse-llama-2-13b-predictor)) respectively.