Muennighoff commited on
Commit
1c5992f
1 Parent(s): 68331cd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -14
README.md CHANGED
@@ -1662,7 +1662,9 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
1662
 
1663
  * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
1664
 
1665
- * 2.5 billion parameters:
 
 
1666
 
1667
  * 30 layers, 32 attention heads
1668
 
@@ -1705,18 +1707,7 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
1705
 
1706
  #### **Training**
1707
 
1708
-
1709
- _In progress._
1710
-
1711
- Current training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/tr11-176B-ml-logs/)
1712
-
1713
- - Checkpoint size:
1714
-
1715
- - Bf16 weights: 329GB
1716
-
1717
- - Full checkpoint with optimizer states: 2.3TB
1718
-
1719
- - Training throughput: About 150 TFLOP per GPU per second
1720
 
1721
  - Number of epochs: 1 (*current target*)
1722
 
@@ -1724,7 +1715,7 @@ Current training logs: [Tensorboard link](https://huggingface.co/tensorboard/big
1724
 
1725
  - Started 11th March, 2022 11:42am PST
1726
 
1727
- - Estimated end: 5th July, 2022
1728
 
1729
  - Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments)
1730
 
1662
 
1663
  * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
1664
 
1665
+ * 3,002,557,440 parameters:
1666
+
1667
+ * 642,252,800 embedding parameters
1668
 
1669
  * 30 layers, 32 attention heads
1670
 
1707
 
1708
  #### **Training**
1709
 
1710
+ Training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/tr11c-2B5-logs)
 
 
 
 
 
 
 
 
 
 
 
1711
 
1712
  - Number of epochs: 1 (*current target*)
1713
 
1715
 
1716
  - Started 11th March, 2022 11:42am PST
1717
 
1718
+ - Ended 5th July, 2022
1719
 
1720
  - Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments)
1721