Muennighoff commited on
Commit
ef8385a
1 Parent(s): ae3a5ff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -15
README.md CHANGED
@@ -122,7 +122,9 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
122
 
123
  * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
124
 
125
- * 760 million parameters:
 
 
126
 
127
  * 24 layers, 16 attention heads
128
 
@@ -166,27 +168,17 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
166
  #### **Training**
167
 
168
 
169
- _In progress._
170
-
171
- Current training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/tr11-176B-ml-logs/)
172
-
173
- - Checkpoint size:
174
-
175
- - Bf16 weights: 329GB
176
-
177
- - Full checkpoint with optimizer states: 2.3TB
178
-
179
- - Training throughput: About 150 TFLOP per GPU per second
180
 
181
- - Number of epochs: 1 (*current target*)
182
 
183
  - Dates:
184
 
185
  - Started 11th March, 2022 11:42am PST
186
 
187
- - Estimated end: 5th July, 2022
188
 
189
- - Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments)
190
 
191
  - Server training location: Île-de-France, France
192
 
122
 
123
  * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
124
 
125
+ * 1,065,314,304 parameters:
126
+
127
+ * 385,351,680 embedding parameters
128
 
129
  * 24 layers, 16 attention heads
130
 
168
  #### **Training**
169
 
170
 
171
+ Training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/tr11d-760M-logs)
 
 
 
 
 
 
 
 
 
 
172
 
173
+ - Number of epochs: 1
174
 
175
  - Dates:
176
 
177
  - Started 11th March, 2022 11:42am PST
178
 
179
+ - Ended 5th July, 2022
180
 
181
+ - Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments and other model sizes)
182
 
183
  - Server training location: Île-de-France, France
184