Back to all models
fill-mask mask_token: [MASK]
Query this model
🔥 This model is currently loaded and running on the Inference API. ⚠️ This model could not be loaded by the inference API. ⚠️ This model can be loaded on the Inference API on-demand.
JSON Output
API endpoint
								$
								curl -X POST \
-H "Authorization: Bearer YOUR_ORG_OR_USER_API_TOKEN" \
-H "Content-Type: application/json" \
-d '"json encoded string"' \
https://api-inference.huggingface.co/models/seiya/oubiobert-base-uncased
Share Copied link to clipboard

Monthly model downloads

seiya/oubiobert-base-uncased seiya/oubiobert-base-uncased
1,402 downloads
last 30 days

pytorch

tf

Contributed by

seiya Shoya Wada
1 model

How to use this model directly from the 🤗/transformers library:

			
Copy to clipboard
from transformers import AutoTokenizer, AutoModelWithLMHead tokenizer = AutoTokenizer.from_pretrained("seiya/oubiobert-base-uncased") model = AutoModelWithLMHead.from_pretrained("seiya/oubiobert-base-uncased")

ouBioBERT-Base, Uncased

Bidirectional Encoder Representations from Transformers for Biomedical Text Mining by Osaka University (ouBioBERT) is a language model based on the BERT-Base (Devlin, et al., 2019) architecture. We pre-trained ouBioBERT on PubMed abstracts from the PubMed baseline (ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline) via our method.

The details of the pre-training procedure can be found in Wada, et al. (2020).

Evaluation

We evaluated the performance of ouBioBERT in terms of the biomedical language understanding evaluation (BLUE) benchmark (Peng, et al., 2019). The numbers are mean (standard deviation) on five different random seeds.

Dataset Task Type Score
MedSTS Sentence similarity 84.9 (0.6)
BIOSSES Sentence similarity 92.3 (0.8)
BC5CDR-disease Named-entity recognition 87.4 (0.1)
BC5CDR-chemical Named-entity recognition 93.7 (0.2)
ShARe/CLEFE Named-entity recognition 80.1 (0.4)
DDI Relation extraction 81.1 (1.5)
ChemProt Relation extraction 75.0 (0.3)
i2b2 2010 Relation extraction 74.0 (0.8)
HoC Document classification 86.4 (0.5)
MedNLI Inference 83.6 (0.7)
Total Macro average of the scores 83.8 (0.3)

Code for Fine-tuning

We made the source code for fine-tuning freely available at our repository.

Citation

If you use our work in your research, please kindly cite the following paper:

@misc{2005.07202,
Author = {Shoya Wada and Toshihiro Takeda and Shiro Manabe and Shozo Konishi and Jun Kamohara and Yasushi Matsumura},
Title = {A pre-training technique to localize medical BERT and enhance BioBERT},
Year = {2020},
Eprint = {arXiv:2005.07202},
}