aehrm commited on
Commit
cb0865c
1 Parent(s): bd5d1dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md CHANGED
@@ -6,3 +6,62 @@ tags:
6
  language: de
7
  ---
8
  # REDEWIEDERGABE Tagger: free indirect STWR
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  language: de
7
  ---
8
  # REDEWIEDERGABE Tagger: free indirect STWR
9
+
10
+ This model is part of an ensemble of binary taggers that recognize German speech, thought and writing representation. They can be used to automatically detect and annotate the following 4 types of speech, thought and writing representation in German texts:
11
+
12
+ | STWR type | Example | Translation |
13
+ |--------------------------------|-------------------------------------------------------------------------|----------------------------------------------------------|
14
+ | direct | Dann sagte er: **"Ich habe Hunger."** | Then he said: **"I'm hungry."** |
15
+ | free indirect ('erlebte Rede', **this tagger**) | Er war ratlos. **Woher sollte er denn hier bloß ein Mittagessen bekommen?** | He was at a loss. **Where should he ever find lunch here?** |
16
+ | indirect | Sie fragte, **wo das Essen sei.** | She asked **where the food was.** |
17
+ | reported | **Sie sprachen über das Mittagessen.** | **They talked about lunch.** |
18
+
19
+ The ensemble is trained on the [REDEWIEDERGABE corpus](https://github.com/redewiedergabe/corpus) ([Annotation guidelines](http://redewiedergabe.de/richtlinien/richtlinien.html)), fine-tuning each tagger on the domain-adapted [lkonle/fiction-gbert-large](https://huggingface.co/lkonle/fiction-gbert-large). ([Training Code](https://github.com/cophi-wue/LLpro/blob/main/contrib/train_redewiedergabe.py))
20
+
21
+ **F1-Scores:**
22
+
23
+ | STWR type | F1-Score |
24
+ |-----------|-----------|
25
+ | direct | 90.76 |
26
+ | indirect | 79.16 |
27
+ | **free indirect (this tagger)** | **58.00** |
28
+ | reported | 70.47 |
29
+
30
+ ----
31
+
32
+ **Demo Usage:**
33
+
34
+ ```python
35
+ from flair.data import Sentence
36
+ from flair.models import SequenceTagger
37
+
38
+
39
+ sentence = Sentence('Sie sprachen über das Mittagessen. Sie fragte, wo das Essen sei. Woher sollte er das wissen? Dann sagte er: "Ich habe Hunger."')
40
+
41
+ rwtypes = ['direct', 'indirect', 'freeindirect', 'reported']
42
+ for rwtype in rwtypes:
43
+ model = SequenceTagger.load(f'aehrm/redewiedergabe-{rwtype}')
44
+ model.predict(sentence)
45
+ print(rwtype, [ x.data_point.text for x in sentence.get_labels() ])
46
+ # >>> direct ['"', 'Ich', 'habe', 'Hunger', '.', '"']
47
+ # >>> indirect ['wo', 'das', 'Essen', 'sei', '.']
48
+ # >>> freeindirect ['Woher', 'sollte', 'er', 'das', 'wissen', '?']
49
+ # >>> reported ['Sie', 'sprachen', 'über', 'das', 'Mittagessen', '.', 'Woher', 'sollte', 'er', 'das', 'wissen', '?']
50
+ ```
51
+
52
+ **Cite**:
53
+
54
+ Please cite the following paper when using this model.
55
+
56
+ ```
57
+ @inproceedings{ehrmanntraut-et-al-llpro-2023,
58
+ address = {Ingolstadt, Germany},
59
+ title = {{LLpro}: A Literary Language Processing Pipeline for {German} Narrative Text},
60
+ booktitle = {Proceedings of the 10th Conference on Natural Language Processing ({KONVENS} 2022)},
61
+ publisher = {{KONVENS} 2023 Organizers},
62
+ author = {Ehrmanntraut, Anton and Konle, Leonard and Jannidis, Fotis},
63
+ year = {2023},
64
+ }
65
+ ```
66
+
67
+