retrieva-jp
/

bert-1.3b

Model card Files Files and versions Community

jnishi commited on Jun 27

Commit

198973a

•

1 Parent(s): d96a69d

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -76,7 +76,9 @@ The Retrieva BERT model was pre-trained on the reunion of five datasets:
 - Chinese Wikipedia dumped on 20240120.
 - Korean Wikipedia dumped on 20240120.
 - [The Stack](https://huggingface.co/datasets/bigcode/the-stack)
 The model was trained on 180 billion tokens using the above dataset.
 ### Training Procedure
 The model was trained on 4 to 32 H100 GPUs with a batch size of 1,024.
 We adopted the curriculum learning which is similar to the Sequence Length Warmup and training with the following sequence lengths and number of steps.

 - Chinese Wikipedia dumped on 20240120.
 - Korean Wikipedia dumped on 20240120.
 - [The Stack](https://huggingface.co/datasets/bigcode/the-stack)
 The model was trained on 180 billion tokens using the above dataset.
 ### Training Procedure
 The model was trained on 4 to 32 H100 GPUs with a batch size of 1,024.
 We adopted the curriculum learning which is similar to the Sequence Length Warmup and training with the following sequence lengths and number of steps.