File size: 6,686 Bytes
757067f
af5b012
9dfb86d
 
 
 
4f3726a
9dfb86d
 
4f3726a
 
9dfb86d
4f3726a
af5b012
9dfb86d
01ae0d9
4f3726a
064dcfa
4f3726a
 
9dfb86d
 
 
 
 
4f3726a
c3a5e56
4f3726a
 
 
9dfb86d
4f3726a
9dfb86d
757067f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a13a231
 
 
 
 
 
 
 
 
8d234af
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d496e2
 
 
 
 
8d234af
66d0582
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73bfccb
 
 
 
 
064dcfa
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118

---
language: 
  - de
tags:
  - text-classication
license: mit
metrics:
  - precision
  - recall
  - f1
widget:
- text: "Im Artikel wird leider nicht erwähnt, inwieweit und ob dadurch Natur zerstört werden muss."
---

# WTWM Newsroom Mentions Detector

Please node that this model originates from the ["What's there, what's missing"](https://interaktiv.br.de/ai-detect-newsroom-mentions-in-comments/) collaboration of [AI & Automation Labl of Bayerischer Rundfunk (BR hereafter)](https://www.br.de/extra/ai-automation-lab/index.html) and [Mitteldeutscher Rundfunk (mdr hereafter)](https://www.mdr.de/) as well as [ida](https://idalab.de/). The collaboration took place during the [JournalismAI fellowship '22](https://www.lse.ac.uk/media-and-communications/polis/JournalismAI/Fellowship-Programme) (see chapter **The fellowship** below). The model presented is part of the the documenation of the half year of project time. The related technical framework can be found a [github](https://github.com/br-data/wtwm-topic-modelling).

## The task

This is a model for the task of classifying whether or not a articles comment addresses the moderation team/authors of the media house that published the article. In this prototype stage the media houses are Bayerischer Rundfunk and Mitteldeutscher Rundfunk.

This classification task is implemented as a binary classification into:

label 0: the comment holds no mention

label 1: the comment addresses the moderation team/authors of the media house

We decided to use [german-gpt2](https://huggingface.co/dbmdz/german-gpt2) by MDZ of Bayerische Staatsbibliothek as the foundation model.

**This model is still work in progress and might be updated in the future.**


## Dataset & preprocessing

This model was finetuned on a corpus of 18.860 user comments with a share of user comments from BR and mdr websites and social media channels. The ratio of comments without mentions and with mentions is 92% to 8%. With the initial annotated data the share of comments with mentions was 2% of the data. To run the first round of training during the time of the [JournalismAI fellowship '22](https://www.lse.ac.uk/media-and-communications/polis/JournalismAI/Fellowship-Programme), we decided to augment the corpus by 1421 generated comments with mentions. The generated comments were annotated the same way as the initial data.
Please note, that the generated comments are merely meant to kick off the training of the prototype model. Retraining of the model in later iterations of our system will ignore the generated comments and solely depend on authentic comments.

The preprocessing of the data included:
- remove linebreaks
- remove html tags
- remove emojis
- remove formatting fragments (e.g. "---------", "......")
- remove gaps (~ two or more adjacent spaces)
- strip comments for whitespaces at the begin and end of the corpus

We advice to perform the same preprocessing steps when working with the mode.

## Training 

After multiple test runs of finetuning the present model was further trained using the following parameters:
- foundation_model: [german-gpt2](https://huggingface.co/dbmdz/german-gpt2)
- num_train_epochs: 4
- learning_rate: 2e-7
- weight_decay: 0.1
- metric_for_best_model: precision

### Example: Direct model evaluation

```python

from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    pipeline,
)

comment = "The preprocessed comment to classify"
tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForSequenceClassification.from_pretrained(model_path)
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

result = pipe(comment)
label = result[0]["label"]
if label == "LABEL_1":
    has_mention = True
elif label == "LABEL_0":
    has_mention = False

print(f"Comment includes mention {has_mention}")
```

## Limitations

Clearly, the amount of training data was to small for a state of the art result. This can be seen in the evaluation chapter. Future rounds of retraining have to be performed. For the sake of completeness we publish this model here within [the projects documentation](https://interaktiv.br.de/ai-detect-newsroom-mentions-in-comments/).

An analysis of possible biases reproduced by the present model, regardless of whether they originate from our finetuning or the underlying gpt2 model, is beyond the scope of this work. We assume that biases exist within the model and an analysis will be a task for future work


## Evaluation
The model was evaluated on a held-out test set consisting of 10% of the corpus.


### Quantitative
As a general training approach we decided to optimize for the precision of the detection of the mentions in comments. This strategy best fits the high speed moderation challenge the moderation team's faces in everyday work. Our goal is to focus their attention only to comments that are very likely to contain a mention and not to confuse the moderation team with comments that don't contain mentions.  

In addition we decided not to include the accuracy score in our evaluation because its high values are misleading for the interpretation of the evaluation. This effect is because of the strong imbalance in the distribution between comments with and without mentions. E.g., a classification that would label each comment as without mentions would receive a accuracy of 0.92 percentage points of accuracy.

| mentions total | mentions predicted | precision | recall | f1 |
|-|-|-|-|-|
| 148 | 130 | 0.74 | 0.65 | 0.69 |

### Qualitative

A qualitative evaluation conducted by members of the BR and mdr in the daily context of the comment moderation live system resulted in a 88% human agreement on the publish comments.

## Conclusion

The qualitative evaluation of [this project](https://interaktiv.br.de/ai-detect-newsroom-mentions-in-comments/) makes us confident that the mediocre quantitative results can be overcome with a sufficiently large corpus and that the overall prototype of the project can be a usefull addition to comment moderation tools.


## The fellowship

[JournalismAI](https://www.lse.ac.uk/media-and-communications/polis/JournalismAI) is a project of [Polis](https://www.lse.ac.uk/media-and-communications/polis) – the journalism think-tank at the London School of Economics and Political Science – and it’s sponsored by the [Google News Initiative](https://newsinitiative.withgoogle.com/)). If you want to know more about the Fellowship and the other JournalismAI activities, [sign up for the newsletter](https://mailchi.mp/lse.ac.uk/journalismai) or get in touch with the team via hello@journalismai.info