yjernite commited on
Commit
cfd13fd
1 Parent(s): 8024608

moving on to section 2

Browse files
Files changed (1) hide show
  1. app.py +36 -29
app.py CHANGED
@@ -110,22 +110,22 @@ with gr.Blocks() as demo:
110
  <br/>
111
 
112
  Diffusion Models are commonly used in text-conditioned image generation systems, such as Stable Diffusion or Dall-E 2.
113
- In those systems, a user writes a "*prompt*" as input, and receives an image that corresponds to what the prompt is describing as output.
114
- For example, if the user asks for a "*Photo protrait of a **scientist***", they expect to get an image that looks photorealistic,
115
  prominently features at least one person, and this person might be wearing a lab coat or safety goggles.
116
- A "*Photo portrait of a **carpenter***", on the other other hand, might be set against a background depicting wooden scaffolding or a workshop (see pictures above).
117
 
118
- At the start of this project, we found that while systems do make good use of background and context cues to represent different professions,
119
  there were also some concerning trends about the perceived genders and ethnicities of the people depicted in these professional situations.
120
  After trying a few such prompts, we were left asking: why do all the people depicted in these pictures **look like white men**?
121
  Why do the only exceptions appear to be fast food workers and other lower wage professions?
122
  And finally, what could be the **consequences of such a lack of diversity** in the system outputs?
123
 
124
- **Look like** is the operative phrase here as the people depicted in the pictures are synthetic and so do not belong to socially-constructed groups.
125
  Consequently, since we cannot assign a gender or ethnicity label to each data point,
126
  we instead focus on dataset-level trends in visual features that are correlated with social variation in the text prompts.
127
  We do this through *controlled prompting* and *hierarchical clustering*: for each system,
128
- we obtain a dataset of images corresponding to prompts of the format "*Photo portrait of a **(identity terms)** person at work*",
129
  where ***(identity terms)*** jointly enumerate phrases describing ethnicities and phrases denoting gender.
130
  We then cluster these images by similarity and create an [Identity Representation Demo](https://hf.co/spaces/society-ethics/DiffusionFaceClustering)
131
  to showcase the visual trends encoded in these clusters - as well as their relation to the social variables under consideration.
@@ -141,15 +141,14 @@ with gr.Blocks() as demo:
141
  Our goal with this strategy is to trigger variation in the images that viewers will associate with social markers.
142
  We use three phrases denoting genders (*man*, *woman*, *non-binary*), and 18 phrases describing ethnicities -
143
  some of which are sometimes understood as similar in the US context, such as First Nations and Indigenous American.
144
- We also left the gender and ethnicity phrases unspecified in some prompts.
145
-
146
- This approach places visual features on a multidimensional spectrum without ascribing a prior number of distinct values for social categories or their intersection.
147
  It is also limited by the training data of the models under consideration and the set of identity terms use,
148
  which in our application are more relevant to the North American context than to other regions.
149
 
150
  How then can we use these clusters in practice?
151
  Let's look for example at **cluster 2 in the 24-cluster setting**:
152
- we see that most prompts for images in the cluster used the word *woman* and one of the words denoting *Hispanic* origin.
153
  This tells us that images that are similar to the ones in this cluster will **likely look like** Hispanic women to viewers.
154
  You can cycle through [a few other examples right](https://hf.co/spaces/society-ethics/DiffusionFaceClustering "or even better, visualize them in the app"),
155
  such as cluster 19 which mostly features the words *Caucasian* and *man*, different gender term distributions for *African American* in 0 and 6,
@@ -185,8 +184,8 @@ with gr.Blocks() as demo:
185
  """
186
  #### [Stereotypical Representations and Associations](https://hf.co/spaces/society-ethics/DiffusionFaceClustering "Select cluster to visualize to the left or go straight to the interactive demo")
187
 
188
- Even before we start leveraging these clusters to analyze system behaviors in other settings,
189
- they already provide useful insights about some of the bias dynamics in the system.
190
  In particular, they help us understand which groups are more susceptible to representation harms when using these systems,
191
  especially when the models are more likely to associate them with **stereotypical representations**.
192
 
@@ -209,25 +208,25 @@ with gr.Blocks() as demo:
209
  """
210
  #### [Specification, Markedness, and Bias](https://hf.co/spaces/society-ethics/DiffusionFaceClustering "Select cluster to visualize to the right or go straight to the interactive demo")
211
 
212
- The last phenomenon we study through the lens of our clustering is that of **markedness**,
213
  or *default behavior* of the models when neither gender nor ethnicity is specified.
214
- This corresponds to asking the question: "If I **don't** explicitly tell my model to generate a person of specific genders or ethnicities,
215
  will I still see diverse outcomes?"
216
 
217
  Unsurprisingly, we find that not to be the case.
218
- The clusters with the most examples of both prompts with unspecified gender and ethnicity terms are **clusters 5 and 19**,
219
  and both are also strongly associated with the words *man*, *White*, and *Causian*.
220
- This association holds across genders (as showcased by **cluster 15**, which has a majority of *woman* and *White* prompts)
221
- and ethnicities (comparing the proportions of unspecified genders in **clusters 0 and 6**).
222
 
223
  This provides the beginning of an answer to our motivating question: since users rarely specify an explicit gender or ethnicity when using
224
- these systems to generate images of people, the high likelihood of defaulting to *Whiteness* and *masculinity* is likely to at least partially explain the observed lack of diversity.
225
  We compare these behaviors across systems and professions in the next section.
226
  """
227
  )
228
  with gr.Column(scale=1):
229
  id_cl_id_3 = gr.Dropdown(
230
- choices=[5, 19, 0, 6, 15],
231
  value=6,
232
  show_label=False,
233
  )
@@ -250,19 +249,27 @@ with gr.Blocks() as demo:
250
 
251
  gr.Markdown(
252
  """
253
- ### Exploring Biases
254
- """
255
- )
256
- gr.Markdown(
 
 
 
 
 
257
  """
258
- Machine Learning models encode and amplify biases that are represented in the data that they are trained on -this can include, for instance, stereotypes around the appearances of members of different professions. In our study, we prompted the 3 text-to-image models with texts pertaining to 150 different professions and analyzed the presence of different identity groups in the images generated. We found evidence of many societal stereotypes in the images generated, such as the fact that people in positions of power (e.g. director, CEO) are often White- and male-appearing, while the images generated for other professions are more diverse. Read more about our findings in the accordion below or directly via the [Diffusion Cluster Explorer](https://huggingface.co/spaces/society-ethics/DiffusionClustering) tool.
259
- """
260
  )
261
- with gr.Accordion("Exploring Biases", open=False):
262
- gr.HTML(
 
 
 
 
 
 
 
263
  """
264
- <p style="margin-bottom: 14px; font-size: 100%"> We also explore the correlations between the professions that use used in our prompts and the different identity clusters that we identified. <br> Using both the <a href='https://huggingface.co/spaces/society-ethics/DiffusionClustering' style='text-decoration: underline;' target='_blank'> Diffusion Cluster Explorer </a> and the <a href='https://huggingface.co/spaces/society-ethics/DiffusionFaceClustering' style='text-decoration: underline;' target='_blank'> Identity Representation Demo </a>, we can see which clusters are most correlated with each profession and what identities are in these clusters.</p>
265
- """
266
  )
267
  with gr.Row():
268
  with gr.Column():
 
110
  <br/>
111
 
112
  Diffusion Models are commonly used in text-conditioned image generation systems, such as Stable Diffusion or Dall-E 2.
113
+ In those systems, a user writes a *"prompt"* as input, and receives an image that corresponds to what the prompt is describing as output.
114
+ For example, if the user asks for a *"Photo protrait of a **scientist**"*, they expect to get an image that looks photorealistic,
115
  prominently features at least one person, and this person might be wearing a lab coat or safety goggles.
116
+ A *"Photo portrait of a **carpenter**"*, on the other other hand, might be set against a background depicting wooden scaffolding or a workshop (see pictures above).
117
 
118
+ At the start of this project, we found that while systems do make use of background and context cues to represent different professions,
119
  there were also some concerning trends about the perceived genders and ethnicities of the people depicted in these professional situations.
120
  After trying a few such prompts, we were left asking: why do all the people depicted in these pictures **look like white men**?
121
  Why do the only exceptions appear to be fast food workers and other lower wage professions?
122
  And finally, what could be the **consequences of such a lack of diversity** in the system outputs?
123
 
124
+ **Look like** is the operative phrase here: the people depicted in the pictures are synthetic and so do not belong to socially-constructed groups.
125
  Consequently, since we cannot assign a gender or ethnicity label to each data point,
126
  we instead focus on dataset-level trends in visual features that are correlated with social variation in the text prompts.
127
  We do this through *controlled prompting* and *hierarchical clustering*: for each system,
128
+ we obtain a dataset of images corresponding to prompts of the format *"Photo portrait of a **(identity terms)** person at work"*,
129
  where ***(identity terms)*** jointly enumerate phrases describing ethnicities and phrases denoting gender.
130
  We then cluster these images by similarity and create an [Identity Representation Demo](https://hf.co/spaces/society-ethics/DiffusionFaceClustering)
131
  to showcase the visual trends encoded in these clusters - as well as their relation to the social variables under consideration.
 
141
  Our goal with this strategy is to trigger variation in the images that viewers will associate with social markers.
142
  We use three phrases denoting genders (*man*, *woman*, *non-binary*), and 18 phrases describing ethnicities -
143
  some of which are sometimes understood as similar in the US context, such as First Nations and Indigenous American.
144
+ We also leave the gender and ethnicity phrases unspecified in some prompts.
145
+ This approach has the advantage of placing visual features on a multidimensional spectrum without ascribing a prior number of distinct values for social categories or their intersection.
 
146
  It is also limited by the training data of the models under consideration and the set of identity terms use,
147
  which in our application are more relevant to the North American context than to other regions.
148
 
149
  How then can we use these clusters in practice?
150
  Let's look for example at **cluster 2 in the 24-cluster setting**:
151
+ we see that most prompts for images in the cluster use the word *woman* and one of the words denoting *Hispanic* origin.
152
  This tells us that images that are similar to the ones in this cluster will **likely look like** Hispanic women to viewers.
153
  You can cycle through [a few other examples right](https://hf.co/spaces/society-ethics/DiffusionFaceClustering "or even better, visualize them in the app"),
154
  such as cluster 19 which mostly features the words *Caucasian* and *man*, different gender term distributions for *African American* in 0 and 6,
 
184
  """
185
  #### [Stereotypical Representations and Associations](https://hf.co/spaces/society-ethics/DiffusionFaceClustering "Select cluster to visualize to the left or go straight to the interactive demo")
186
 
187
+ Even before we start leveraging these clusters to analyze system behaviors in other applications,
188
+ they already provide useful insights into some of the bias dynamics in the system.
189
  In particular, they help us understand which groups are more susceptible to representation harms when using these systems,
190
  especially when the models are more likely to associate them with **stereotypical representations**.
191
 
 
208
  """
209
  #### [Specification, Markedness, and Bias](https://hf.co/spaces/society-ethics/DiffusionFaceClustering "Select cluster to visualize to the right or go straight to the interactive demo")
210
 
211
+ The second phenomenon we study through the lens of our clustering is that of **markedness**,
212
  or *default behavior* of the models when neither gender nor ethnicity is specified.
213
+ This corresponds to asking the question: "If I **don't explicitly** tell my model to generate people with diverse identity characteristics,
214
  will I still see diverse outcomes?"
215
 
216
  Unsurprisingly, we find that not to be the case.
217
+ The clusters with the most examples of prompts with unspecified gender and unspecified ethnicity terms are **clusters 5 and 19**,
218
  and both are also strongly associated with the words *man*, *White*, and *Causian*.
219
+ This association holds across genders (as showcased by **cluster 15**, which has a majority of *woman* and *White* prompts along with unspecified ethnicity)
220
+ and across ethnicities (comparing the proportions of unspecified genders in **clusters 0 and 6**: 18 % and 38% for the clusters with more *aoman* and more *man* respectively along with the *African American* phrases).
221
 
222
  This provides the beginning of an answer to our motivating question: since users rarely specify an explicit gender or ethnicity when using
223
+ these systems to generate images of people, the high likelihood of defaulting to *Whiteness* and *masculinity* is likely to at least partially explain the observed lack of diversity in the outputs.
224
  We compare these behaviors across systems and professions in the next section.
225
  """
226
  )
227
  with gr.Column(scale=1):
228
  id_cl_id_3 = gr.Dropdown(
229
+ choices=[5, 19, 15, 0, 6],
230
  value=6,
231
  show_label=False,
232
  )
 
249
 
250
  gr.Markdown(
251
  """
252
+ ### Quantifying Social Biases in Image Generations of Professions
253
+
254
+ Machine Learning models encode and amplify biases that are represented in the data that they are trained on -
255
+ this can include, for instance, stereotypes around the appearances of members of different professions.
256
+ In our study, we prompted the 3 text-to-image models with texts pertaining to 150 different professions
257
+ and analyzed the presence of different identity groups in the images generated. We found evidence of many societal stereotypes in the images generated,
258
+ such as the fact that people in positions of power (e.g. director, CEO) are often White- and male-appearing,
259
+ while the images generated for other professions are more diverse.
260
+ Read more about our findings in the accordion below or directly via the [Diffusion Cluster Explorer](https://hf.co/spaces/society-ethics/DiffusionClustering) tool.
261
  """
 
 
262
  )
263
+ with gr.Accordion("Quantifying Social Biases in Image Generations of Professions", open=False):
264
+ gr.Markdown(
265
+ """
266
+ <br/>
267
+ We also explore the correlations between the professions that use used in our prompts and the different identity clusters that we identified.
268
+
269
+ Using both the [Diffusion Cluster Explorer](https://hf.co/spaces/society-ethics/DiffusionClustering)
270
+ and the [Identity Representation Demo](https://hf.co/spaces/society-ethics/DiffusionFaceClustering),
271
+ we can see which clusters are most correlated with each profession and what identities are in these clusters.
272
  """
 
 
273
  )
274
  with gr.Row():
275
  with gr.Column():