This demo makes us of the English section of the CrowS-Pair dataset of Névéol et al. (2022), which is adapted from the original version by Nangia et al. (2020). ### References: [CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models](https://aclanthology.org/2020.emnlp-main.154) (Nangia et al., EMNLP 2020) [French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English](https://aclanthology.org/2022.acl-long.583) (Névéol et al., ACL 2022) ### Note: Measuring bias in language models is hard! How to measure bias in language models is not trivial and still an active area of research. First of all, what is bias? As you may have noticed, stereotypes may change across languages and cultures. What is problematic in the USA, may not be relevant in the Netherlands---each cultural context requires its own careful evaluation. Furthermore, defining good ways to measure it is also difficult. For example, [Blodgett et al. (2021)](https://aclanthology.org/2021.acl-long.81/) find that typos, nonsensical examples, and other mistakes threaten the validity of CrowS-Pairs, the dataset we show above (partially addressed by Névéol et al., 2022).