aehrm
/

redewiedergabe-freeindirect

Token Classification

Flair

PyTorch

German

sequence-tagger-model

Model card Files Files and versions Community

aehrm commited on Aug 23, 2023

Commit

cb0865c

1 Parent(s): bd5d1dc

Update README.md

Browse files

Files changed (1) hide show

README.md +59 -0

README.md CHANGED Viewed

@@ -6,3 +6,62 @@ tags:
 language: de
 ---
 # REDEWIEDERGABE Tagger: free indirect STWR

 language: de
 ---
 # REDEWIEDERGABE Tagger: free indirect STWR
+This model is part of an ensemble of binary taggers that recognize German speech, thought and writing representation. They can be used to automatically detect and annotate the following 4 types of speech, thought and writing representation in German texts:
+| STWR type                      | Example                                                                 | Translation                                              |
+|--------------------------------|-------------------------------------------------------------------------|----------------------------------------------------------|
+| direct                         | Dann sagte er: **"Ich habe Hunger."**                                       | Then he said: **"I'm hungry."**                             |
+| free indirect ('erlebte Rede',  **this tagger**) | Er war ratlos. **Woher sollte er denn hier bloß ein Mittagessen bekommen?** | He was at a loss. **Where should he ever find lunch here?** |
+| indirect                 | Sie fragte, **wo das Essen sei.**                                           | She asked **where the food was.**                            |
+| reported                  | **Sie sprachen über das Mittagessen.**                                      | **They talked about lunch.**                                 |
+The ensemble is trained on the [REDEWIEDERGABE corpus](https://github.com/redewiedergabe/corpus) ([Annotation guidelines](http://redewiedergabe.de/richtlinien/richtlinien.html)), fine-tuning each tagger on the domain-adapted [lkonle/fiction-gbert-large](https://huggingface.co/lkonle/fiction-gbert-large). ([Training Code](https://github.com/cophi-wue/LLpro/blob/main/contrib/train_redewiedergabe.py))
+**F1-Scores:**
+| STWR type |  F1-Score |
+|-----------|-----------|
+| direct    | 90.76     |
+| indirect | 79.16  |
+| **free indirect (this tagger)** | **58.00**  |
+| reported   | 70.47   |
+----
+**Demo Usage:**
+```python
+from flair.data import Sentence
+from flair.models import SequenceTagger
+sentence = Sentence('Sie sprachen über das Mittagessen. Sie fragte, wo das Essen sei. Woher sollte er das wissen? Dann sagte er: "Ich habe Hunger."')
+rwtypes = ['direct', 'indirect', 'freeindirect', 'reported']
+for rwtype in rwtypes:
+    model = SequenceTagger.load(f'aehrm/redewiedergabe-{rwtype}')
+    model.predict(sentence)
+    print(rwtype, [ x.data_point.text for x in sentence.get_labels() ])
+# >>> direct ['"', 'Ich', 'habe', 'Hunger', '.', '"']
+# >>> indirect ['wo', 'das', 'Essen', 'sei', '.']
+# >>> freeindirect ['Woher', 'sollte', 'er', 'das', 'wissen', '?']
+# >>> reported ['Sie', 'sprachen', 'über', 'das', 'Mittagessen', '.', 'Woher', 'sollte', 'er', 'das', 'wissen', '?']
+```
+**Cite**:
+Please cite the following paper when using this model.
+```
+@inproceedings{ehrmanntraut-et-al-llpro-2023,
+	address = {Ingolstadt, Germany},
+	title = {{LLpro}: A Literary Language Processing Pipeline for {German} Narrative Text},
+	booktitle = {Proceedings of the 10th Conference on Natural Language Processing ({KONVENS} 2022)},
+	publisher = {{KONVENS} 2023 Organizers},
+	author = {Ehrmanntraut, Anton and Konle, Leonard and Jannidis, Fotis},
+	year = {2023},
+}
+```