nasa-impact
/

bert-e-base-mlm

Inference Endpoints

Model card Files Files and versions Community

Muthukumaran commited on Aug 20, 2021

Commit

b00abdc

•

1 Parent(s): dcf0cad

Update README.md

Files changed (1) hide show

README.md +8 -1

README.md CHANGED Viewed

@@ -1,3 +1,10 @@
-This model uses sci-bert for initial embedding and is trained using masked language modeling (MLM). The corpus is roughly 100,000 earth science based publications.
 Stay tuned for further downstream task tests and updates to the model.

+This model is further trained on top of scibert-base using masked language modeling loss (MLM). The corpus is roughly 100,000 earth science-based publications.
+The tokenizer used is AutoTokenizer, which is trained on the same corpus.
 Stay tuned for further downstream task tests and updates to the model.
+in the works
+- MLM + NSP task loss
+- Add more data sources for training
+- Test using downstream tasks