Spaces:

evaluate-measurement
/

honest

Running

App Files Files Community

honest / README.md

lvwerra HF staff

Update Space (evaluate main: 8e762637)

1b4eee7 over 1 year ago

preview code

raw history blame contribute delete

No virus

5.43 kB

	---
	title: Honest
	emoji: 🤗
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: 3.0.2
	app_file: app.py
	pinned: false
	tags:
	- evaluate
	- measurement
	description: >-
	The HONEST score is a multilingual score that aims to compute how likely each language model is to produce hurtful completions based on a predefined set of prompts.
	---

	# Measurement Card for HONEST

	## Measurement description
	The HONEST score aims to measure hurtful sentence completions in language models.
	The score uses HurtLex, a multilingual lexicon of hurtful language, to evaluate the completions.
	It aims to quantify how often sentences are completed with a hurtful word, and if there is a difference between
	groups (e.g. genders, sexual orientations, etc.).

	## How to use

	When loading the model, specify the language of the prompts and completions.
	The available languages are: 'it' (Italian), 'fr' (French), 'es' (Spanish), 'pt' (Portuguese), 'ro' (Romanian), 'en' (English).

	```python
	>>> honest = evaluate.load('honest', 'en')
	```

	Arguments:
	predictions (list of list of `str`): a list of completions to [HONEST prompts](https://huggingface.co/datasets/MilaNLProc/honest)
	groups (list of `str`) (optional): a list of the identity groups each list of completions belongs to.


	## Output values

	`honest_score`: the HONEST score, representing the average number of hurtful completions across all groups
	`honest_score_per_group`: the HONEST score of each group separately.

	### Values from popular papers
	In the [original HONEST paper](https://aclanthology.org/2021.naacl-main.191.pdf), the following scores were calculated for models, with Top K referring to the number of model completions that were evaluated:


	\| Model Name \| Top K =1 \| Top K =5 \|Top K =20 \|
	\| ---------------- \| -------- \| -------- \| -------- \|
	\| UmBERTo (OSCAR) \| 5.24 \| 8.19 \| 7.14 \|
	\| UmBERTo (Wiki) \| 5.48 \| 7.19 \| 5.14 \|
	\| GilBERTo \| 7.14 \| 11.57 \| 8.68 \|
	\| ItalianBERT XXL \| 9.05 \| 10.67 \| 9.12 \|
	\| FlauBERT \| 4.76 \| 3.29 \| 2.43 \|
	\| CamemBERT (OSCAR)\| 18.57 \| 9.62 \| 7.07 \|
	\| CamemBERT (Wiki) \| 7.62 \| 4.90 \| 4.19 \|
	\| BETO \| 4.29 \| 5.95 \| 6.88 \|
	\| BERTimbau \| 4.05 \| 6.00 \| 5.04 \|
	\| RomanianBERT \| 4.76 \| 3.90 \| 4.61 \|
	\| BERT-base \| 1.19 \| 2.67 \| 3.55 \|
	\| BERT-large \| 3.33 \| 3.43 \| 4.30 \|
	\| RoBERTa-base \| 2.38 \| 5.38 \| 5.74 \|
	\| RoBERTa-large \| 2.62 \| 2.33 \| 3.05 \|
	\| DistilBERT-base \| 1.90 \| 3.81 \| 3.96 \|
	\| GPT-2 (IT) \| 12.86 \| 11.76 \| 12.56 \|
	\| GPT-2 (FR) \| 19.76 \| 19.67 \| 17.81 \|
	\| GPT-2 (PT) \| 9.52 \| 10.71 \| 10.29 \|
	\| GPT-2 (EN) \| 17.14 \| 12.81 \| 13.00 \|


	## Examples

	Example 1: Calculating HONEST without groups

	```python
	>>> honest = evaluate.load('honest', 'en')
	>>> completions = [['CEO', 'businessman', 'politician'], ['good', 'excellent', 'smart'], ['secretary', 'prostitute', 'maid'], ['beautiful', 'smart', 'tall']]
	>>> result = honest.compute(predictions=completions)
	>>> round(result['honest_score'],3)
	0.167
	```

	Example 2: Calculating HONEST with 2 groups (e.g. male/female)
	```python
	>>> honest = evaluate.load('honest', 'en')
	>>> completions = [['CEO', 'businessman', 'father'], ['good', 'excellent', 'smart'], ['secretary', 'prostitute', 'maid'], ['beautiful', 'smart', 'tall']]
	>>> groups = ["male", "male", "female", "female"]
	>>> result = honest.compute(predictions=completions, groups = groups)
	>>> {g:round(s,3) for g,s in result['honest_score_per_group'].items()}
	{'male': 0.0, 'female': 0.167}
	```

	Example 2: Calculating HONEST in another language (French)
	```python
	>>> honest = evaluate.load('honest', 'fr')
	>>> completions = [['PDG', 'avocat', 'père'], ['bon','excellent', 'intelligent'], ['secrétaire', 'prostituée', 'bonne'], ['belle', 'intelligente', 'grande']]
	>>> result = honest.compute(predictions=completions)
	>>> round(result['honest_score'],3)
	0.083
	```

	## Citation

	```bibtex
	@inproceedings{nozza-etal-2021-honest,
	title = {"{HONEST}: Measuring Hurtful Sentence Completion in Language Models"},
	author = "Nozza, Debora and Bianchi, Federico and Hovy, Dirk",
	booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
	month = jun,
	year = "2021",
	address = "Online",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2021.naacl-main.191",
	doi = "10.18653/v1/2021.naacl-main.191",
	pages = "2398--2406",
	}
	```

	```bibtex
	@inproceedings{nozza-etal-2022-measuring,
	title = {Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals},
	author = "Nozza, Debora and Bianchi, Federico and Lauscher, Anne and Hovy, Dirk",
	booktitle = "Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion",
	publisher = "Association for Computational Linguistics",
	year={2022}
	}
	```

	## Further References
	- Bassignana, Elisa, Valerio Basile, and Viviana Patti. ["Hurtlex: A multilingual lexicon of words to hurt."](http://ceur-ws.org/Vol-2253/paper49.pdf) 5th Italian Conference on Computational Linguistics, CLiC-it 2018. Vol. 2253. CEUR-WS, 2018.