Feature Extraction
Transformers
Safetensors
English
bamboo
custom_code
yixinsong commited on
Commit
482ce4f
1 Parent(s): ed75727

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -4,7 +4,7 @@ Sparse computing is increasingly recognized as an important direction to improve
4
 
5
  Recent studies ([Zhang el al., 2021](https://arxiv.org/abs/2110.01786); [Liu et al., 2023](https://openreview.net/pdf?id=wIPIhHd00i); [Mirzadeh et al., 2023](https://arxiv.org/abs/2310.04564)) reveal that LLMs inherently exhibit properties conducive to sparse computation when employing the ReLU activation function. This insight opens up new avenues for model efficiency, akin to MoE's selective activation. By dynamically choosing model parameters for computation, we can substantially boost efficiency.
6
 
7
- However, the widespread adoption of ReLU-based models in the LLM field remains limited. Here we introduce a new 7B ReLU-based LLM, Bamboo, which boasts nearly 85% sparsity and performance levels on par with [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1).
8
 
9
  ## Model Architecture
10
 
 
4
 
5
  Recent studies ([Zhang el al., 2021](https://arxiv.org/abs/2110.01786); [Liu et al., 2023](https://openreview.net/pdf?id=wIPIhHd00i); [Mirzadeh et al., 2023](https://arxiv.org/abs/2310.04564)) reveal that LLMs inherently exhibit properties conducive to sparse computation when employing the ReLU activation function. This insight opens up new avenues for model efficiency, akin to MoE's selective activation. By dynamically choosing model parameters for computation, we can substantially boost efficiency.
6
 
7
+ However, the widespread adoption of ReLU-based models in the LLM field remains limited. Here we introduce a new 7B ReLU-based LLM, Bamboo, which boasts nearly 85% sparsity and performance levels on par with [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1).
8
 
9
  ## Model Architecture
10