phlobo's picture
Update README.md
9cc4bb4 verified
metadata
language:
  - de
tags:
  - medical
  - ggponc
widget:
  - text: Vitamin C, E und A
    example_title: Forward Ellipsis
  - text: Chemo- und Strahlentherapie
    example_title: Backward Ellipsis
  - text: HPV-16- und/oder -18-Positivität
    example_title: Complex Ellipsis

Model

Fine-tuned mt5-base model for resolving elliptical coordinated compound noun phrases (ECCNPs) in German text. ECCNPs are are special type of coordination ellipses, where a part of a compound noun is omitted due to coordination (e.g., "and", "or", "/").

For instance, Chemo- und Strahlentherapie (chemo- and radiotherapy) is the elliptical form of Chemotherapie und Strahlentherapie (chemotherapy and radiotherapy).

Dataset

The model has been fine-tuned with a subset of sentences of GGPONC 2.0 containing manually annotated ECCNPs and their resolution. The annotated dataset is available on Zenodo: https://zenodo.org/records/12529883

Usage

The model can be loaded as a Text2TextGenerationPipeline:

from transformers import pipeline
pipe = pipeline(model="phlobo/german-ellipses-resolver-mt5-base")
pipe("Chemo- und Strahlentherapie")
>>> [{'generated_text': 'Chemotherapie und Strahlentherapie'}]
pipe("Vitamin C, E und A")
>>> [{'generated_text': 'Vitamin C, Vitamin E und Vitamin A'}]

It is recommended to set max_length to control the maximum output length. For most German sentences, a value of 256 should be enough:

pipe = pipeline(model="phlobo/german-ellipses-resolver-mt5-base", max_length=256)

Paper

Our approach and its evaluation have been published at the ACL BioNLP'23 workshop.

Please cite the following paper if you find our model useful:

@inproceedings{kammer-etal-2023-resolving,
    title = "Resolving Elliptical Compounds in {G}erman Medical Text",
    author = "Kammer, Niklas  and
      Borchert, Florian  and
      Winkler, Silvia  and
      de Melo, Gerard  and
      Schapranow, Matthieu-P.",
    editor = "Demner-fushman, Dina  and
      Ananiadou, Sophia  and
      Cohen, Kevin",
    booktitle = "The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.bionlp-1.26",
    doi = "10.18653/v1/2023.bionlp-1.26",
    pages = "292--305"
}