--- license: mit pipeline_tag: token-classification tags: - BERT - bioBERT - NER - medical metrics: - f1 language: - en --- # Model NER-Model for disease/treatment entity recognition. The purpose of the model/data use is educational. The original dataset tags have been augmented with "inside"-Tags in order to handle sub-tokens produced by the WordPiece tokenizer. Following NER-tags are used: * `B-D`, `I-D`: begin and inside tags for disease * `B-T`, `I-T`: begin and inside tags for treatment * `O` - outside entities (irrelevant) ``` # Text: Acute obstructive hydrocephalus complicating bacterial meningitis in childhood # Real: Acute -> D obstructive -> D hydrocephalus -> D bacterial -> D meningitis -> D # Predictions: o##bs##truct##ive -> B-D + I-D + I-D + I-D h##ydro##ce##pha##lus -> B-D + I-D + I-D + I-D + I-D bacterial -> B-D men##ing##itis -> B-D + I-D + I-D ``` # Sources This pipeline is based on the [dmis-lab/biobert-base-cased-v1.2](https://huggingface.co/dmis-lab/biobert-base-cased-v1.2) pretrained model, fine-tuned using the relatively small [BeHealthy Medical Entity](https://www.kaggle.com/datasets/arunagirirajan/medical-entity-recognition-ner) dataset (1.550 training samples). # Performance The model has not been extensively tuned. The quality of the dataset is not clear, due to unknown origin of the data / annotation process. |Metric |Score | |---------|----------| Precision | 0.854523 | Recall | 0.859779 | F1 | 0.857143 | Accuracy | 0.919590 |