File size: 5,429 Bytes
2661d79
 
1b4eee7
 
 
2661d79
1b4eee7
2661d79
 
1b4eee7
 
 
 
 
2661d79
 
1b4eee7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
title: Honest
emoji: 🤗
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false
tags:
- evaluate
- measurement
description: >-
  The HONEST score is a multilingual score that aims to compute how likely each language model is to produce hurtful completions based on a predefined set of prompts.
---

# Measurement Card for HONEST

## Measurement description
The HONEST score aims to measure hurtful sentence completions in language models.
The score uses HurtLex, a multilingual lexicon of hurtful language, to evaluate the completions.
It aims to quantify how often sentences are completed with a hurtful word, and if there is a difference between
groups (e.g. genders, sexual orientations, etc.).

## How to use

When loading the model, specify the language of the prompts and completions.
The available languages are: 'it' (Italian), 'fr' (French), 'es' (Spanish), 'pt' (Portuguese), 'ro' (Romanian), 'en' (English).

```python
>>> honest = evaluate.load('honest', 'en')
```

Arguments:
    **predictions** (list of list of `str`): a list of completions to [HONEST prompts](https://huggingface.co/datasets/MilaNLProc/honest)
    **groups** (list of `str`) (*optional*): a list of the identity groups each list of completions belongs to.


## Output values

`honest_score`: the HONEST score, representing the average number of hurtful completions across all groups
`honest_score_per_group`: the HONEST score of each group separately.

### Values from popular papers
In the [original HONEST paper](https://aclanthology.org/2021.naacl-main.191.pdf), the following scores were calculated for models, with Top K referring to the number of model completions that were evaluated:


| Model Name       | Top K =1 | Top K =5 |Top K =20 |
| ---------------- | -------- | -------- | -------- |
| UmBERTo (OSCAR)  | 5.24     | 8.19     |  7.14    |
| UmBERTo (Wiki)   | 5.48     | 7.19     |  5.14    |
| GilBERTo         | 7.14     | 11.57    |  8.68    |
| ItalianBERT XXL  | 9.05     | 10.67    |  9.12    |
| FlauBERT         | 4.76     | 3.29     |  2.43    |
| CamemBERT (OSCAR)| 18.57    | 9.62     |  7.07    |
| CamemBERT (Wiki) | 7.62     | 4.90     |  4.19    |
| BETO             | 4.29     |  5.95    |  6.88    |
| BERTimbau        | 4.05     | 6.00     |  5.04    |
| RomanianBERT     | 4.76     | 3.90     |  4.61    |
| BERT-base        | 1.19     | 2.67     |  3.55    |
| BERT-large       | 3.33     | 3.43     |  4.30    |
| RoBERTa-base     | 2.38     | 5.38     |  5.74    |
| RoBERTa-large    | 2.62     | 2.33     |  3.05    |
| DistilBERT-base  | 1.90     | 3.81     |  3.96    |
| GPT-2 (IT)       | 12.86    | 11.76    |  12.56   |
| GPT-2 (FR)       | 19.76    | 19.67    |  17.81   |
| GPT-2 (PT)       | 9.52     | 10.71    | 10.29    |
| GPT-2 (EN)       | 17.14    | 12.81    |  13.00   |


## Examples

Example 1: Calculating HONEST without groups

```python
>>> honest = evaluate.load('honest', 'en')
>>> completions = [['CEO', 'businessman', 'politician'], ['good', 'excellent', 'smart'], ['secretary', 'prostitute', 'maid'], ['beautiful', 'smart', 'tall']]
>>> result = honest.compute(predictions=completions)
>>> round(result['honest_score'],3)
0.167
```

Example 2: Calculating HONEST with 2 groups (e.g. male/female)
```python
>>> honest = evaluate.load('honest', 'en')
>>> completions = [['CEO', 'businessman', 'father'], ['good', 'excellent', 'smart'], ['secretary', 'prostitute', 'maid'], ['beautiful', 'smart', 'tall']]
>>> groups = ["male", "male", "female", "female"]
>>> result = honest.compute(predictions=completions, groups = groups)
>>> {g:round(s,3) for g,s in result['honest_score_per_group'].items()}
{'male': 0.0, 'female': 0.167}
```

Example 2: Calculating HONEST in another language (French)
```python
>>> honest = evaluate.load('honest', 'fr')
>>> completions = [['PDG', 'avocat', 'père'], ['bon','excellent', 'intelligent'], ['secrétaire', 'prostituée', 'bonne'], ['belle', 'intelligente', 'grande']]
>>> result = honest.compute(predictions=completions)
>>> round(result['honest_score'],3)
0.083
```

## Citation

```bibtex
@inproceedings{nozza-etal-2021-honest,
    title = {"{HONEST}: Measuring Hurtful Sentence Completion in Language Models"},
    author = "Nozza, Debora and Bianchi, Federico  and Hovy, Dirk",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.naacl-main.191",
    doi = "10.18653/v1/2021.naacl-main.191",
    pages = "2398--2406",
}
```

```bibtex
@inproceedings{nozza-etal-2022-measuring,
    title = {Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals},
    author = "Nozza, Debora and Bianchi, Federico and Lauscher, Anne and Hovy, Dirk",
    booktitle = "Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion",
    publisher = "Association for Computational Linguistics",
    year={2022}
}
```

## Further References
- Bassignana, Elisa, Valerio Basile, and Viviana Patti. ["Hurtlex: A multilingual lexicon of words to hurt."](http://ceur-ws.org/Vol-2253/paper49.pdf) 5th Italian Conference on Computational Linguistics, CLiC-it 2018. Vol. 2253. CEUR-WS, 2018.