Update README.md
Browse files
README.md
CHANGED
@@ -9,11 +9,15 @@ datasets:
|
|
9 |
---
|
10 |
## Introducation
|
11 |
|
12 |
-
Sparse computing is increasingly recognized as an important direction to improve the computational efficiency of large language models (LLM).
|
13 |
|
14 |
-
Recent studies ([Zhang el al., 2021](https://arxiv.org/abs/2110.01786); [Liu et al., 2023](https://openreview.net/pdf?id=wIPIhHd00i); [Mirzadeh et al., 2023](https://arxiv.org/abs/2310.04564)) reveal that LLMs inherently exhibit properties conducive to sparse computation when employing the ReLU activation function.
|
|
|
|
|
15 |
|
16 |
-
However, the widespread adoption of ReLU-based models in the LLM field remains limited.
|
|
|
|
|
17 |
|
18 |
## Model Architecture
|
19 |
|
@@ -75,9 +79,10 @@ Our evaluation is based on the framework lm-evaluation-harness and opencompass.
|
|
75 |
| Ours | 0.6389 | 0.7593 | 0.4406 | 0.8217 | 0.5315 | 0.6195 | 0.256 | | |
|
76 |
| Mistral | 0.6265 | 0.7924 | 0.4262 | 0.8332 | 0.4018 | 0.6143 | 0.2621 | | |
|
77 |
|
78 |
-
## Speed Evaluation Results
|
79 |
|
80 |
-
We utilize [PowerInfer](https://arxiv.org/pdf/2312.12456.pdf), a state-of-the-art acceleration framework leveraging activation sparsity.
|
|
|
81 |
|
82 |
## Limitation & Disclaimer
|
83 |
|
|
|
9 |
---
|
10 |
## Introducation
|
11 |
|
12 |
+
Sparse computing is increasingly recognized as an important direction to improve the computational efficiency (e.g., inference speed) of large language models (LLM).
|
13 |
|
14 |
+
Recent studies ([Zhang el al., 2021](https://arxiv.org/abs/2110.01786); [Liu et al., 2023](https://openreview.net/pdf?id=wIPIhHd00i); [Mirzadeh et al., 2023](https://arxiv.org/abs/2310.04564)) reveal that LLMs inherently exhibit properties conducive to sparse computation when employing the ReLU activation function.
|
15 |
+
This insight opens up new avenues for inference speed, akin to MoE's selective activation.
|
16 |
+
By dynamically choosing model parameters for computation, we can substantially boost inference speed.
|
17 |
|
18 |
+
However, the widespread adoption of ReLU-based models in the LLM field remains limited.
|
19 |
+
Here we introduce a new 7B ReLU-based LLM, Bamboo(Github link:[https://github.com/SJTU-IPADS/Bamboo](https://github.com/SJTU-IPADS/Bamboo)),
|
20 |
+
which boasts nearly 85% sparsity and performance levels on par with [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1).
|
21 |
|
22 |
## Model Architecture
|
23 |
|
|
|
79 |
| Ours | 0.6389 | 0.7593 | 0.4406 | 0.8217 | 0.5315 | 0.6195 | 0.256 | | |
|
80 |
| Mistral | 0.6265 | 0.7924 | 0.4262 | 0.8332 | 0.4018 | 0.6143 | 0.2621 | | |
|
81 |
|
82 |
+
## Inference Speed Evaluation Results
|
83 |
|
84 |
+
We utilize [PowerInfer](https://arxiv.org/pdf/2312.12456.pdf), a state-of-the-art acceleration framework leveraging activation sparsity.
|
85 |
+
Here we show the inference speed compared with llama.cpp/transformers.
|
86 |
|
87 |
## Limitation & Disclaimer
|
88 |
|