nomic-ai
/

nomic-bert-2048

Model card Files Files and versions Community

zpn commited on Jan 31, 2024

Commit

926bdb3

•

1 Parent(s): c351bc1

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -13,8 +13,8 @@ inference: false
 `nomic-bert-2048` is a BERT model pretrained on `wikipedia` and `bookcorpus` with a max sequence length of 2048.
-We make several modifications to our BERT training procedure inspired by [MosaicBERT](https://www.databricks.com/blog/mosaicbert).
-Namely, we:
 - Use [Rotary Position Embeddings](https://arxiv.org/pdf/2104.09864.pdf) to allow for context length extrapolation.
 - Use SwiGLU activations as it has [been shown](https://arxiv.org/abs/2002.05202) to [improve model performance](https://www.databricks.com/blog/mosaicbert)
 - Set dropout to 0

 `nomic-bert-2048` is a BERT model pretrained on `wikipedia` and `bookcorpus` with a max sequence length of 2048.
+We make several modifications to our BERT training procedure similar to [MosaicBERT](https://www.databricks.com/blog/mosaicbert).
+Namely, we add:
 - Use [Rotary Position Embeddings](https://arxiv.org/pdf/2104.09864.pdf) to allow for context length extrapolation.
 - Use SwiGLU activations as it has [been shown](https://arxiv.org/abs/2002.05202) to [improve model performance](https://www.databricks.com/blog/mosaicbert)
 - Set dropout to 0