language: en
tags:
- veterinary
- pets
- vetbert
- BERT
widget:
- text: >-
Hx: 7 yo canine with history of vomiting intermittently since yesterday.
No other concerns. Still eating and drinking [MASK]. cPL negative.
example_title: normally
VetBERT Disease Syndrome Classifier
This is a finetuned version of the VetBERT model, designed to classify the disease syndrome within a veterinary clinical note.
This pretrained model is designed for performing NLP tasks related to veterinary clinical notes. The Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes (Hur et al., BioNLP 2020) paper introduced VetBERT model: an initialized Bert Model with ClinicalBERT (Bio+Clinical BERT) and further pretrained on the VetCompass Australia corpus for performing tasks specific to veterinary medicine.
Pretraining Data
The VetBERT model was initialized from Bio_ClinicalBERT model, which was initialized from BERT. The VetBERT model was trained on over 15 million veterinary clincal Records and 1.3 Billion tokens.
Pretraining Hyperparameters
During the pretraining phase for VetBERT, we used a batch size of 32, a maximum sequence length of 512, and a learning rate of 5 · 10−5. The dup factor for duplicating input data with different masks was set to 5. All other default parameters were used (specifically, masked language model probability = 0.15 and max predictions per sequence = 20).
VetBERT Finetuning
VetBERT was further finetuned on a set of 5002 annotated clinical notes to classifiy the disease syndrome associated with the clinical notes as outlined in the paper: Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes
How to use the model
Load the model via the transformers library:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("havocy28/VetBERTDx")
model = AutoModel.from_pretrained("havocy28/VetBERTDx")
Citation
Please cite this article: Brian Hur, Timothy Baldwin, Karin Verspoor, Laura Hardefeldt, and James Gilkerson. 2020. Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes. In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, pages 156–166, Online. Association for Computational Linguistics.