de_ggponc_medbertde / README.md
phlobo's picture
Update README.md
caff61f verified
|
raw
history blame
No virus
3.02 kB
metadata
tags:
  - spacy
  - token-classification
language:
  - de
model-index:
  - name: de_ggponc_medbertde
    results:
      - task:
          name: NER (fine-grained, nested spans)
          type: token-classification
        metrics:
          - name: F1 score (Test set, fine-grained, nested spans)
            type: f_score
            value: 0.7415
          - name: Precision (Test set, fine-grained, nested spans)
            type: precision
            value: 0.7304
          - name: Recall  (Test set, fine-grained, nested spans)
            type: recall
            value: 0.7529
datasets:
  - bigbio/ggponc2
library_name: spacy

Clinical NER model using spaCy's SpanCategorizer implementation and medBERT.de.

Usage:

!huggingface-cli download phlobo/de_ggponc_medbertde de_ggponc_medbertde-any-py3-none-any.whl --local-dir .
!pip install de_ggponc_medbertde-any-py3-none-any.whl

import spacy
nlp = spacy.load('de_ggponc_medbertde')
d = nlp("allein nach Versagen einer Behandlung mit Oxaliplatin und Irinotecan")
for e in d.spans['entities']:
  print(e, e.label_)

yields:

Oxaliplatin Clinical_Drug
Irinotecan Clinical_Drug
Versagen einer Behandlung Other_Finding
Behandlung mit Oxaliplatin und Irinotecan Therapeutic

The model has been trained on gold standard labels in GGPONC 2.0 (https://aclanthology.org/2022.lrec-1.389/).

It detects the following 8 entity classes:

  • Findings: Diagnosis / Pathology and Other Findings
  • Substances: Clinical Drug, Nutrients / Body Substances, External Substances
  • Procedures: Therapeutic, Diagnostic

The configuration for training the model is available here: https://github.com/hpi-dhc/ggponc

When using the model, please cite the following publication:

@inproceedings{borchert-etal-2022-ggponc,
    title = "{GGPONC} 2.0 - The {G}erman Clinical Guideline Corpus for Oncology: Curation Workflow, Annotation Policy, Baseline {NER} Taggers",
    author = "Borchert, Florian  and
      Lohr, Christina  and
      Modersohn, Luise  and
      Witt, Jonas  and
      Langer, Thomas  and
      Follmann, Markus  and
      Gietzelt, Matthias  and
      Arnrich, Bert  and
      Hahn, Udo  and
      Schapranow, Matthieu-P.",
    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    pages = "3650--3660"
}
Feature Description
Name de_ggponc_medbertde
Version 1.0.0
spaCy >=3.4.4,<3.5.0
Default Pipeline transformer, morphologizer, parser, transformer_spancat, spancat
Components transformer, morphologizer, parser, transformer_spancat, spancat
License The model may be used for non-commercial research activities only, see also the Terms of Use of GGPONC: https://www.leitlinienprogramm-onkologie.de/projekte/ggponc-english
Author Florian Borchert