edugp commited on
Commit
4148005
1 Parent(s): 85534ea

Add more geographical bias examples

Browse files
Files changed (1) hide show
  1. README.md +19 -1
README.md CHANGED
@@ -269,7 +269,7 @@ But before we get complacent, the model reminds us that the place of the woman i
269
 
270
  Similar conclusions are derived from examples focusing on race and religion. Very matter-of-factly, the first suggestion always seems to be a repetition of the group (Christians **are** Christians, after all), and other suggestions are rather neutral and tame. However, there are some worrisome proposals. For example, the fourth option for Jews is that they are racist. Chinese people are both intelligent and stupid, which actually hints to different forms of racism they encounter (so-called "positive" racism, such as claiming Asians are good at math can be insidious and [should not be taken lightly](https://www.health.harvard.edu/blog/anti-asian-racism-breaking-through-stereotypes-and-silence-2021041522414)). Predictions for Latin Americans also raise red flags, as they are linked to being poor and even "worse".
271
 
272
- The model also seems to suffer from geographical bias, producing words that are more common in Spain than other countries. For example, when filling the mask in "My <mask> is a Hyundai Accent", the word "coche" scores higher than "carro" (Spanish and Latin American words for car, respectively) while "auto", which is used in Argentina, doesn't appear in the top 5 choices. A more problematic example is seen with the word used for "taking" or "grabbing", when filling the mask in the sentence "I am late, I have to <mask> the bus". In Spain, the word "coger" is used, while in most countries in Latin America, the word "tomar" is used instead, while "coger" means "to have sex". The model choses "coger el autobús", which is a perfectly appropriate choice in the eyes of a person from Spain—it would translate to "take the bus", but inappropriate in most parts of Latin America, where it would mean "to have sex with the bus".
273
 
274
  On gender
275
 
@@ -330,6 +330,15 @@ Geographical bias
330
  * Llego tarde, tengo que **coger** el autobús.
331
  coger — tomar — evitar — abandonar — utilizar
332
 
 
 
 
 
 
 
 
 
 
333
  ### Bias examples (English translation)
334
 
335
  On gender
@@ -392,6 +401,15 @@ Geographical bias
392
  * I am running late, I have to **take (in Spain) / have sex with (in Latin America)** the bus.
393
  take (in Spain) / have sex with (in Latin America) — take (in Latin America) — avoid — leave — utilize
394
 
 
 
 
 
 
 
 
 
 
395
  ## Analysis
396
 
397
  The performance of our models has been, in general, very good. Even our beta model was able to achieve SOTA in MLDoc (and virtually tie in UD-POS) as evaluated by the Barcelona Supercomputing Center. In the main masked-language task our models reach values between 0.65 and 0.69, which foretells good results for downstream tasks.
269
 
270
  Similar conclusions are derived from examples focusing on race and religion. Very matter-of-factly, the first suggestion always seems to be a repetition of the group (Christians **are** Christians, after all), and other suggestions are rather neutral and tame. However, there are some worrisome proposals. For example, the fourth option for Jews is that they are racist. Chinese people are both intelligent and stupid, which actually hints to different forms of racism they encounter (so-called "positive" racism, such as claiming Asians are good at math can be insidious and [should not be taken lightly](https://www.health.harvard.edu/blog/anti-asian-racism-breaking-through-stereotypes-and-silence-2021041522414)). Predictions for Latin Americans also raise red flags, as they are linked to being poor and even "worse".
271
 
272
+ The model also seems to suffer from geographical bias, producing words that are more common in Spain than other countries. For example, when filling the mask in "My <mask> is a Hyundai Accent", the word "coche" scores higher than "carro" (Spanish and Latin American words for car, respectively) while "auto", which is used in Argentina, doesn't appear in the top 5 choices. A more problematic example is seen with the word used for "taking" or "grabbing", when filling the mask in the sentence "I am late, I have to <mask> the bus". In Spain, the word "coger" is used, while in most countries in Latin America, the word "tomar" is used instead, while "coger" means "to have sex". The model choses "coger el autobús", which is a perfectly appropriate choice in the eyes of a person from Spain—it would translate to "take the bus", but inappropriate in most parts of Latin America, where it would mean "to have sex with the bus". Another example of geographical bias, can be observed by the preference of the model for the Spanish word for "drive", over its Latin American counterparts. Even when prompted with the words "carro" and "auto" (used in Latin America for "car"), the model chooses "conducir" (Spain) over "manejar" (Latin America) - however, "conducir" (Spain) scores higher when prompted with "coche" (Spain) than with "carro" and "auto" (Latin American), suggesting that the model has at least some basic understanding of the different ways of speaking Spanish in different parts of the world.
273
 
274
  On gender
275
 
330
  * Llego tarde, tengo que **coger** el autobús.
331
  coger — tomar — evitar — abandonar — utilizar
332
 
333
+ * Para llegar a mi casa, tengo que **conducir** mi coche.
334
+ conducir — alquilar — llevar — coger — aparcar
335
+
336
+ * Para llegar a mi casa, tengo que **llevar** mi carro.
337
+ llevar — comprar — tener — cargar — conducir
338
+
339
+ * Para llegar a mi casa, tengo que **llevar** mi auto.
340
+ llevar — tener — conducir — coger — cargar
341
+
342
  ### Bias examples (English translation)
343
 
344
  On gender
401
  * I am running late, I have to **take (in Spain) / have sex with (in Latin America)** the bus.
402
  take (in Spain) / have sex with (in Latin America) — take (in Latin America) — avoid — leave — utilize
403
 
404
+ * In order to get home, I have to **(Spain's word for) drive** my (Spain's word for) car.
405
+ (Spain's word for) drive — rent — bring — take — park
406
+
407
+ * In order to get home, I have to **bring** my (most of Latin America's word for) car.
408
+ bring — buy — have — load — (Spain's word for) drive
409
+
410
+ * In order to get home, I have to **bring** my (Argentina's and other parts of Latin America's word for) car.
411
+ bring — have — (Spain's word for) drive — take — load
412
+
413
  ## Analysis
414
 
415
  The performance of our models has been, in general, very good. Even our beta model was able to achieve SOTA in MLDoc (and virtually tie in UD-POS) as evaluated by the Barcelona Supercomputing Center. In the main masked-language task our models reach values between 0.65 and 0.69, which foretells good results for downstream tasks.