julien-c HF staff commited on
Commit
f01d672
1 Parent(s): 2de433b

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/allenai/scibert_scivocab_uncased/README.md

Files changed (1) hide show
  1. README.md +26 -0
README.md ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SciBERT
2
+
3
+ This is the pretrained model presented in [SciBERT: A Pretrained Language Model for Scientific Text](https://www.aclweb.org/anthology/D19-1371/), which is a BERT model trained on scientific text.
4
+
5
+ The training corpus was papers taken from [Semantic Scholar](https://www.semanticscholar.org). Corpus size is 1.14M papers, 3.1B tokens. We use the full text of the papers in training, not just abstracts.
6
+
7
+ SciBERT has its own wordpiece vocabulary (scivocab) that's built to best match the training corpus. We trained cased and uncased versions.
8
+
9
+ Available models include:
10
+ * `scibert_scivocab_cased`
11
+ * `scibert_scivocab_uncased`
12
+
13
+
14
+ The original repo can be found [here](https://github.com/allenai/scibert).
15
+
16
+ If using these models, please cite the following paper:
17
+ ```
18
+ @inproceedings{beltagy-etal-2019-scibert,
19
+ title = "SciBERT: A Pretrained Language Model for Scientific Text",
20
+ author = "Beltagy, Iz and Lo, Kyle and Cohan, Arman",
21
+ booktitle = "EMNLP",
22
+ year = "2019",
23
+ publisher = "Association for Computational Linguistics",
24
+ url = "https://www.aclweb.org/anthology/D19-1371"
25
+ }
26
+ ```