Update README.md
Browse files
README.md
CHANGED
@@ -13,8 +13,8 @@ inference: false
|
|
13 |
|
14 |
`nomic-bert-2048` is a BERT model pretrained on `wikipedia` and `bookcorpus` with a max sequence length of 2048.
|
15 |
|
16 |
-
We make several modifications to our BERT training procedure
|
17 |
-
Namely, we:
|
18 |
- Use [Rotary Position Embeddings](https://arxiv.org/pdf/2104.09864.pdf) to allow for context length extrapolation.
|
19 |
- Use SwiGLU activations as it has [been shown](https://arxiv.org/abs/2002.05202) to [improve model performance](https://www.databricks.com/blog/mosaicbert)
|
20 |
- Set dropout to 0
|
|
|
13 |
|
14 |
`nomic-bert-2048` is a BERT model pretrained on `wikipedia` and `bookcorpus` with a max sequence length of 2048.
|
15 |
|
16 |
+
We make several modifications to our BERT training procedure similar to [MosaicBERT](https://www.databricks.com/blog/mosaicbert).
|
17 |
+
Namely, we add:
|
18 |
- Use [Rotary Position Embeddings](https://arxiv.org/pdf/2104.09864.pdf) to allow for context length extrapolation.
|
19 |
- Use SwiGLU activations as it has [been shown](https://arxiv.org/abs/2002.05202) to [improve model performance](https://www.databricks.com/blog/mosaicbert)
|
20 |
- Set dropout to 0
|