VetBERT Pretrained model for Veterinary Clinical Tasks

This is the pretrained VetBERT model from the github repo: https://github.com/havocy28/VetBERT

This pretrained model is designed for performing NLP tasks related to veterinary clinical notes. The Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes (Hur et al., BioNLP 2020) paper introduced VetBERT model: an initialized Bert Model with ClinicalBERT (Bio+Clinical BERT) and further pretrained on the VetCompass Australia corpus for performing tasks specific to veterinary medicine. This paper discusses VetBERTDx, the finetuned version of VetBERT trained for the the disease classification task.

Pretraining Data

The VetBERT model was initialized from Bio_ClinicalBERT model, which was initialized from BERT. The VetBERT model was trained on over 15 million veterinary clincal Records and 1.3 Billion tokens.

Pretraining Hyperparameters

During the pretraining phase for VetBERT, we used a batch size of 32, a maximum sequence length of 512, and a learning rate of 5 · 10−5. The dup factor for duplicating input data with different masks was set to 5. All other default parameters were used (specifically, masked language model probability = 0.15 and max predictions per sequence = 20).

VetBERT Finetuning

VetBERT was further finetuned on a set of 5002 annotated clinical notes to classifiy the disease syndrome associated with the clinical notes as outlined in the paper: Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes

How to use the model

Load the model via the transformers library:

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("havocy28/VetBERT")
model = AutoModelForMaskedLM.from_pretrained("havocy28/VetBERT")

VetBERT_masked = pipeline("fill-mask", model=model, tokenizer=tokenizer)
VetBERT('Suspected pneuomina, will require an [MASK] but in the meantime will prescribed antibiotics')

Citation

Please cite this article: Brian Hur, Timothy Baldwin, Karin Verspoor, Laura Hardefeldt, and James Gilkerson. 2020. Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes. In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, pages 156–166, Online. Association for Computational Linguistics.