Feature Extraction
Transformers
Safetensors
English
bamboo
custom_code
yzmizeyu commited on
Commit
b50d87b
1 Parent(s): 51bc605

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -9,7 +9,7 @@ datasets:
9
  ---
10
  ## Introducation
11
 
12
- Sparse computing is increasingly recognized as an important direction to improve the computational efficiency of large language models (LLM). For example, mixture of experts (MoE) methods show particular promise.
13
 
14
  Recent studies ([Zhang el al., 2021](https://arxiv.org/abs/2110.01786); [Liu et al., 2023](https://openreview.net/pdf?id=wIPIhHd00i); [Mirzadeh et al., 2023](https://arxiv.org/abs/2310.04564)) reveal that LLMs inherently exhibit properties conducive to sparse computation when employing the ReLU activation function. This insight opens up new avenues for model efficiency, akin to MoE's selective activation. By dynamically choosing model parameters for computation, we can substantially boost efficiency.
15
 
 
9
  ---
10
  ## Introducation
11
 
12
+ Sparse computing is increasingly recognized as an important direction to improve the computational efficiency of large language models (LLM).
13
 
14
  Recent studies ([Zhang el al., 2021](https://arxiv.org/abs/2110.01786); [Liu et al., 2023](https://openreview.net/pdf?id=wIPIhHd00i); [Mirzadeh et al., 2023](https://arxiv.org/abs/2310.04564)) reveal that LLMs inherently exhibit properties conducive to sparse computation when employing the ReLU activation function. This insight opens up new avenues for model efficiency, akin to MoE's selective activation. By dynamically choosing model parameters for computation, we can substantially boost efficiency.
15