metadata
license: mit
pipeline_tag: token-classification
tags:
- BERT
- bioBERT
- NER
- medical
metrics:
- f1
language:
- en
Model
NER-Model for disease/treatment/technology entity recognition. The purpose of the model/data use is educational.
The original dataset tags have been augmented with "inside"-Tags in order to handle sub-tokens produced by the WordPiece tokenizer. Following NER-tags are used:
B-DISEASE
,I-DISEASE
: begin and inside tags for diseaseB-TREATMENT
,I-TREATMENT
: begin and inside tags for treatmentB-TECHNOLOGY
,I-TECHNOLOGY
: begin and inside tags for technologyO
- outside entities (irrelevant)
# Text:
Acute obstructive hydrocephalus complicating bacterial meningitis in childhood
# Real:
Acute -> DISEASE
obstructive -> DISEASE
hydrocephalus -> DISEASE
bacterial -> DISEASE
meningitis -> DISEASE
# Predictions:
o##bs##truct##ive -> B-DISEASE + I-DISEASE + I-DISEASE + I-DISEASE
h##ydro##ce##pha##lus -> B-DISEASE + I-DISEASE + I-DISEASE + I-DISEASE + I-DISEASE
bacterial -> B-DISEASE
men##ing##itis -> B-DISEASE + I-DISEASE + I-DISEASE
Sources
This pipeline is based on the dmis-lab/biobert-base-cased-v1.2 pretrained model, fine-tuned using the relatively small BeHealthy Medical Entity dataset (1.550 training samples). The initial version of this model was then used to augment the medical technology dataset. Both datasets were then used to train this model.
Performance
The model has not been extensively tuned. The quality of the dataset is not clear, due to unknown origin of the data / annotation process.
Metric | Score |
---|---|
Precision | 0.836892 |
Recall | 0.766610 |
F1 | 0.800211 |
Accuracy | 0.935253 |