vocab-transformers
/

distilbert-word2vec_256k-MLM_250k

Inference Endpoints

Model card Files Files and versions Community

nreimers commited on Apr 7, 2022

Commit

136548a

•

1 Parent(s): 0184a58

readme

Files changed (1) hide show

README.md +5 -0

README.md ADDED Viewed

	@@ -0,0 +1,5 @@

+# DistilBERT with word2vec token embeddings
+This model has a word2vec token embedding matrix with 256k entries. The word2vec was trained on 100GB data from C4, MSMARCO, News, Wikipedia, S2ORC, for 3 epochs.
+Then the model was trained on this dataset with MLM for 250k steps (batch size 64). The token embeddings were NOT updated.