File size: 5,090 Bytes
d42d88e
 
b11c850
 
 
 
 
d42d88e
b11c850
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
license: cc-by-nc-4.0
language:
- en
pipeline_tag: text-generation
tags:
- counter speech
---

---

# Target-Aware Counter-Speech Generation

<!-- Provide a quick summary of what the model is/does. -->

The target-aware counter-speech generation model is an autoregressive generative language model fine-tuned on hate- and counter-speech pairs from the [CONAN](https://github.com/marcoguerini/CONAN) datasets for generating more contextually relevant counter-speech, based on the [gpt2-medium](https://huggingface.co/gpt2-medium) model.
The model utilizes special tokens that embedded target demographic information to guide the generation towards more relevant responses, avoiding off-topic and generic responses. The model is trained on 8 target demographics, including Migrants, People of Color (POC), LGBT+, Muslims, Women, Jews, Disabled, and Other.

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
The model is intended for generating counter-speech responses for a given hate speech sequence, combined with special tokens for target-demographic embeddings.


## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

We observed negative effects such as content hallucination and toxic response generation. Though the intended use is to generate counter-speech for combating online hatred, the usage is to be monitored carefully with human post-editing or approval system, ensuring safe and inclusive online environment.


## How to Get Started with the Model

Use the code below to get started with the model.


    
    types = ["MIGRANTS", "POC", "LGBT+", "MUSLIMS", "WOMEN", "JEWS", "other", "DISABLED"] # A list of all available target-demographic tokens
    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    model = AutoModelForCausalLM.from_pretrained(tum-nlp/gpt-2-medium-target-aware-counterspeech-generation)
    tokenizer = AutoTokenizer.from_pretrained(tum-nlp/gpt-2-medium-target-aware-counterspeech-generation)
    tokenizer.padding_side = "left" 

    prompt = "<|endoftext|> <other> Hate-speech: Human are not created equal, some are born lesser. Counter-speech: "
    input = tokenizer(prompt, return_tensors="pt", padding=True)
    output_sequences = model.generate(
            input_ids=inputs['input_ids'].to(model.device),
            attention_mask=inputs['attention_mask'].to(model.device),
            pad_token_id=tokenizer.eos_token_id,
            max_length=128,
            num_beams=3,
            no_repeat_ngram_size=3,
            num_return_sequences=1,
            early_stopping=True
        )
      result = tokenizer.decode(output_sequences, skip_special_tokens=True)


#### Training Hyperparameters

    training_args = TrainingArguments(
      num_train_epochs=20,
      learning_rate=3.800568576836524e-05,
      weight_decay=0.050977894796868116,
      warmup_ratio=0.10816909354342182,
      optim="adamw_torch",
      lr_scheduler_type="cosine",
      evaluation_strategy="epoch",
      save_strategy="epoch",
      save_total_limit=3,
      load_best_model_at_end=True,
      auto_find_batch_size=True,
    )


## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Data Card if possible. -->

The model's performance is tested on three test sets, from which two are subsets of the [CONAN](https://github.com/marcoguerini/CONAN) dataset and one is the sexist portion of the [EDOS](https://github.com/rewire-online/edos) dataset

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

The model's performance is tested on a custom evaluation pipeline for counter-speech generation. The pipeline includes CoLA, Toxicity, Hatefulness, Offensiveness, Label and Context Similarity, Validity as Counter-Speech, Repetition Rate, target-demographic F1 and the Arithmetic Mean


### Results
CONAN
| Model Name | CoLA |TOX | Hate | OFF | L Sim | C Sim | VaCS | RR | F1 | AM |
| ---------- | ---- | -- | ---- | --- | ----- | ----- | ---- | -- | -- | -- |
| Human      | 0.937 | 0.955 | 1.000 | 0.997 | -  | 0.751 | 0.980 | 0.861 | 0.885 | 0.929 |
| target-aware gpt2-medium | 0.958 | 0.946 | 1.000 | 0.996 | 0.706 | 0.784 | 0.946 | 0.419 | 0.880 | 0.848 |

CONAN SMALL
| Model Name | CoLA |TOX | Hate | OFF | L Sim | C Sim | VaCS | RR | F1 | AM |
| ---------- | ---- | -- | ---- | --- | ----- | ----- | ---- | -- | -- | -- |
| Human | 0.963 | 0.956 | 1.000 | 1.000 | 1.000 | 0.768 | 0.988 | 0.995 | 0.868 | 0.949 |
| target-aware gpt2-medium | 0.975 | 0.931 | 1.000 | 1.000 | 0.728 | 0.783 | 0.888 | 0.911 | 0.792 | 0.890 |

EDOS
| Model Name | CoLA |TOX | Hate | OFF | C Sim | VaCS | RR | F1 | AM |
| ---------- | ---- | -- | ---- | --- | ----- | ---- | -- | -- | -- |
| target-aware gpt2-medium | 0.930 | 0.815 | 0.999 | 0.975 | 0.689 | 0.857 | 0.518 | 0.747 | 0.816|