jnishi commited on
Commit
198973a
1 Parent(s): d96a69d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -76,7 +76,9 @@ The Retrieva BERT model was pre-trained on the reunion of five datasets:
76
  - Chinese Wikipedia dumped on 20240120.
77
  - Korean Wikipedia dumped on 20240120.
78
  - [The Stack](https://huggingface.co/datasets/bigcode/the-stack)
 
79
  The model was trained on 180 billion tokens using the above dataset.
 
80
  ### Training Procedure
81
  The model was trained on 4 to 32 H100 GPUs with a batch size of 1,024.
82
  We adopted the curriculum learning which is similar to the Sequence Length Warmup and training with the following sequence lengths and number of steps.
 
76
  - Chinese Wikipedia dumped on 20240120.
77
  - Korean Wikipedia dumped on 20240120.
78
  - [The Stack](https://huggingface.co/datasets/bigcode/the-stack)
79
+
80
  The model was trained on 180 billion tokens using the above dataset.
81
+
82
  ### Training Procedure
83
  The model was trained on 4 to 32 H100 GPUs with a batch size of 1,024.
84
  We adopted the curriculum learning which is similar to the Sequence Length Warmup and training with the following sequence lengths and number of steps.