isydmr commited on
Commit
5152943
1 Parent(s): 1e7fdcc

Correct blogpost link

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -54,7 +54,7 @@ for seq in sequences:
54
 
55
  ```
56
 
57
- For fast inference with Falcon, check-out [Text Generation Inference](https://github.com/huggingface/text-generation-inference)! Read more in this [blogpost]((https://huggingface.co/blog/falcon).
58
 
59
  You will need **at least 85-100GB of memory** to swiftly run inference with Falcon-40B.
60
 
@@ -153,11 +153,11 @@ Falcon-40B is a causal decoder-only model trained on a causal language modeling
153
 
154
  The architecture is broadly adapted from the GPT-3 paper ([Brown et al., 2020](https://arxiv.org/abs/2005.14165)), with the following differences:
155
 
156
- * **Positionnal embeddings:** rotary ([Su et al., 2021](https://arxiv.org/abs/2104.09864));
157
  * **Attention:** multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)) and FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135));
158
  * **Decoder-block:** parallel attention/MLP with a single layer norm.
159
 
160
- For multiquery, we are using an internal variant which uses independent key and values per tensor parallel degree.
161
 
162
  | **Hyperparameter** | **Value** | **Comment** |
163
  |--------------------|-----------|----------------------------------------|
@@ -175,7 +175,7 @@ Falcon-40B-Instruct was trained on AWS SageMaker, on 64 A100 40GB GPUs in P4d in
175
 
176
  #### Software
177
 
178
- Falcon-40B-Instruct was trained a custom distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO and high-performance Triton kernels (FlashAttention, etc.)
179
 
180
 
181
  ## Citation
@@ -184,7 +184,7 @@ Falcon-40B-Instruct was trained a custom distributed training codebase, Gigatron
184
  ```
185
  @article{falcon40b,
186
  title={{Falcon-40B}: an open large language model with state-of-the-art performance},
187
- author={Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme},
188
  year={2023}
189
  }
190
  ```
 
54
 
55
  ```
56
 
57
+ For fast inference with Falcon, check-out [Text Generation Inference](https://github.com/huggingface/text-generation-inference)! Read more in this [blog post](https://huggingface.co/blog/falcon).
58
 
59
  You will need **at least 85-100GB of memory** to swiftly run inference with Falcon-40B.
60
 
 
153
 
154
  The architecture is broadly adapted from the GPT-3 paper ([Brown et al., 2020](https://arxiv.org/abs/2005.14165)), with the following differences:
155
 
156
+ * **Positional embeddings:** rotary ([Su et al., 2021](https://arxiv.org/abs/2104.09864));
157
  * **Attention:** multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)) and FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135));
158
  * **Decoder-block:** parallel attention/MLP with a single layer norm.
159
 
160
+ For multiquery, we are using an internal variant that uses independent keys and values per tensor parallel degree.
161
 
162
  | **Hyperparameter** | **Value** | **Comment** |
163
  |--------------------|-----------|----------------------------------------|
 
175
 
176
  #### Software
177
 
178
+ Falcon-40B-Instruct was trained in a custom distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO and high-performance Triton kernels (FlashAttention, etc.)
179
 
180
 
181
  ## Citation
 
184
  ```
185
  @article{falcon40b,
186
  title={{Falcon-40B}: an open large language model with state-of-the-art performance},
187
+ author={Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz, and Cappelli, Alessandro and Cojocaru, Ruxandra, and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme},
188
  year={2023}
189
  }
190
  ```