Migrate model card from transformers-repo
Browse filesRead announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/allenai/scibert_scivocab_uncased/README.md
README.md
ADDED
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# SciBERT
|
2 |
+
|
3 |
+
This is the pretrained model presented in [SciBERT: A Pretrained Language Model for Scientific Text](https://www.aclweb.org/anthology/D19-1371/), which is a BERT model trained on scientific text.
|
4 |
+
|
5 |
+
The training corpus was papers taken from [Semantic Scholar](https://www.semanticscholar.org). Corpus size is 1.14M papers, 3.1B tokens. We use the full text of the papers in training, not just abstracts.
|
6 |
+
|
7 |
+
SciBERT has its own wordpiece vocabulary (scivocab) that's built to best match the training corpus. We trained cased and uncased versions.
|
8 |
+
|
9 |
+
Available models include:
|
10 |
+
* `scibert_scivocab_cased`
|
11 |
+
* `scibert_scivocab_uncased`
|
12 |
+
|
13 |
+
|
14 |
+
The original repo can be found [here](https://github.com/allenai/scibert).
|
15 |
+
|
16 |
+
If using these models, please cite the following paper:
|
17 |
+
```
|
18 |
+
@inproceedings{beltagy-etal-2019-scibert,
|
19 |
+
title = "SciBERT: A Pretrained Language Model for Scientific Text",
|
20 |
+
author = "Beltagy, Iz and Lo, Kyle and Cohan, Arman",
|
21 |
+
booktitle = "EMNLP",
|
22 |
+
year = "2019",
|
23 |
+
publisher = "Association for Computational Linguistics",
|
24 |
+
url = "https://www.aclweb.org/anthology/D19-1371"
|
25 |
+
}
|
26 |
+
```
|