emilylearning commited on
Commit
d906d0f
·
1 Parent(s): 302a127

Adding examples to iface. Returning year field to df earlier.

Browse files
Files changed (1) hide show
  1. app.py +37 -19
app.py CHANGED
@@ -241,15 +241,15 @@ def predict_gender_pronouns(
241
 
242
  results = pd.concat(dfs, axis=1).set_index("year")
243
 
244
- female_df = results.filter(regex=".*f_")
245
  female_df_for_plot = (
246
- female_df.reset_index()
247
- ) # Gradio timeseries requires x-axis as column?
248
 
249
- male_df = results.filter(regex=".*m_")
250
  male_df_for_plot = (
251
- male_df.reset_index()
252
- ) # Gradio timeseries requires x-axis as column?
253
 
254
  return (
255
  target_text_w_masks,
@@ -263,24 +263,17 @@ def predict_gender_pronouns(
263
  title = "Changing Gender Pronouns"
264
  description = """
265
  <h2> Intro </h2>
266
-
267
  This is a demo for a project exploring possible spurious correlations that have been learned by our models. We first examined the training datasets and learning tasks to hypothesize what spurious correlations may exist. Below we can condition on these variables to determine what effect they may have on the prediction outcomes.
268
-
269
  Specially in this demo: In a user provided sentence, with at least one reference to a `DATE` and one gender pronoun, we will see how sweeping through a range of `DATE` values can change the predicted pronouns. This effect can be observed in BERT base models and in our fine-tuned models (with a specific pronoun predicting task on the [wiki-bio](https://huggingface.co/datasets/wiki_bio) dataset).
270
-
271
  One way to explain this phenomenon is by looking at a likely data generating process for biographical-like data in both the main BERT training dataset as well as the `wiki_bio` dataset, in the form of a causal DAG.
272
 
273
  <h2> Causal DAG </h2>
274
-
275
  In the DAG, we can see that `birth_place`, `birth_date` and `gender` are all independent elements that have no common cause with the other covariates in the DAG. However `birth_place`, `birth_date` and `gender` may all have a role in causing one's `access_to_resources`, with the general trend that `access_to_resources` has become less gender-dependent over time, but not in every `birth_place`, with recent events in Afghanistan providing a stark counterexample to this trend.
276
-
277
  Importantly, `access_to_resources` determines how, **if at all**, you may appear in the dataset's `context_words`.
278
-
279
  We argue that although there are complex causal interactions between each word in any given sentence, the `context_words` are more likely to cause the `gender_pronouns`, rather than vice versa. For example, if the subject is a famous doctor and the object is her wealthy father, these context words will determine which person is being referred to, and thus which gendered-pronoun to use.
280
 
281
 
282
  In this graph, arrow heads are intended to show the assumed direction of causation. E.g. as described above, we are claiming `context_words` cause the `gender_pronouns`. While causation follow direction of the arrows, statistical correlation can flow in any direction (it is cause-agnostic).
283
-
284
  In the case of this graph, any pink path between `context_words` and `gender_pronouns` will allow the flow of statistical correlation, inviting confounding and thus spurious correlations into the trained model.
285
 
286
  <center>
@@ -291,7 +284,6 @@ In the case of this graph, any pink path between `context_words` and `gender_pro
291
  Those familiar with causal DAGs may note when can simply condition on `gender` to block any confounding between the `context_words` and the `gender_pronouns`. However, this is not always possible, particularly in generative or mask-filling tasks where gender may be unknown, common in language modeling and in the demo below.
292
 
293
  <h2> How to use this demo </h2>
294
-
295
  In this demo, a user can add any sentence that contains at least one gender pronoun and the capitalized word `DATE`. We then sweep through a range of `date` values in the place of `DATE`, while masking (for prediction) the gender pronouns (included in the list below).
296
  ```
297
  gendered_lists = [
@@ -304,26 +296,27 @@ gendered_lists = [
304
  ["husband", "wife"],
305
  ]
306
  ```
307
-
308
  In addition to choosing the test sentence, we ask that you pick how the fine-tuned model was trained:
309
  - conditioning variable: which, if any, conditioning variable from the three noted above in the DAG, was included in the text at train time.
310
  - loss function weight: weight assigned to the minority class (female pronouns in this fine-tuning dataset) that was included in the text at train time.
311
-
312
  You can also optionally pick a bert-like model for comparison.
313
 
314
- <h2> What are the results</h2>
315
 
 
 
 
 
 
 
316
  In the resulting plots, we can look for a dose-response relationship between:
317
  - our treatment: the sample text,
318
  - and our outcome: the predicted gender of pronouns in the text.
319
 
320
  Specifically, we are seeing if 1) making larger magnitude intervention: an older `DATE` in the text will, 2) result in a larger magnitude effect in the outcome: higher percentage of predicted female pronouns.
321
-
322
  Some trends that appear in the test sentences I have tried:
323
  - Conditioning on `birth_date` metadata in both training and inference text has the largest dose-response relationship. This seems reasonable, as the fine-tuned model is able to 'stratify' a learned relationship between gender pronouns and dates, when both are present in the text.
324
  - While conditioning on either no metadata or `birth_place` data training, have similar middle-ground effects for this inference task.
325
  - Finally, conditioning on `name` metadata in training, (while again conditioning on `date` in inference) has almost no dose-response relationship. It appears the learning of a `name —> gender pronouns` relationship was sufficiently successful to overwhelm any potential more nuanced learning, such as that driven by `birth_date` or `place`.
326
-
327
  Please feel free to ping me on the Hugging Face discord (I'm 'emily_learner' there), with any feedback/comments/concerns or interesting findings!
328
  """
329
 
@@ -331,6 +324,30 @@ Please feel free to ping me on the Hugging Face discord (I'm 'emily_learner' the
331
  article = "Check out [main colab notebook](https://colab.research.google.com/drive/14ce4KD6PrCIL60Eng-t79tEI1UP-DHGz?usp=sharing#scrollTo=Mg1tUeHLRLaG) \
332
  with a lot more details about this method and implementation."
333
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
334
  gr.Interface(
335
  fn=predict_gender_pronouns,
336
  inputs=[
@@ -385,4 +402,5 @@ gr.Interface(
385
  title=title,
386
  description=description,
387
  article=article,
 
388
  ).launch()
 
241
 
242
  results = pd.concat(dfs, axis=1).set_index("year")
243
 
244
+ female_df = results.filter(regex=".*f_").reset_index() # Gradio doesn't 'see' index?
245
  female_df_for_plot = (
246
+ female_df
247
+ )
248
 
249
+ male_df = results.filter(regex=".*m_").reset_index() # Gradio doesn't 'see' index?
250
  male_df_for_plot = (
251
+ male_df
252
+ )
253
 
254
  return (
255
  target_text_w_masks,
 
263
  title = "Changing Gender Pronouns"
264
  description = """
265
  <h2> Intro </h2>
 
266
  This is a demo for a project exploring possible spurious correlations that have been learned by our models. We first examined the training datasets and learning tasks to hypothesize what spurious correlations may exist. Below we can condition on these variables to determine what effect they may have on the prediction outcomes.
 
267
  Specially in this demo: In a user provided sentence, with at least one reference to a `DATE` and one gender pronoun, we will see how sweeping through a range of `DATE` values can change the predicted pronouns. This effect can be observed in BERT base models and in our fine-tuned models (with a specific pronoun predicting task on the [wiki-bio](https://huggingface.co/datasets/wiki_bio) dataset).
 
268
  One way to explain this phenomenon is by looking at a likely data generating process for biographical-like data in both the main BERT training dataset as well as the `wiki_bio` dataset, in the form of a causal DAG.
269
 
270
  <h2> Causal DAG </h2>
 
271
  In the DAG, we can see that `birth_place`, `birth_date` and `gender` are all independent elements that have no common cause with the other covariates in the DAG. However `birth_place`, `birth_date` and `gender` may all have a role in causing one's `access_to_resources`, with the general trend that `access_to_resources` has become less gender-dependent over time, but not in every `birth_place`, with recent events in Afghanistan providing a stark counterexample to this trend.
 
272
  Importantly, `access_to_resources` determines how, **if at all**, you may appear in the dataset's `context_words`.
 
273
  We argue that although there are complex causal interactions between each word in any given sentence, the `context_words` are more likely to cause the `gender_pronouns`, rather than vice versa. For example, if the subject is a famous doctor and the object is her wealthy father, these context words will determine which person is being referred to, and thus which gendered-pronoun to use.
274
 
275
 
276
  In this graph, arrow heads are intended to show the assumed direction of causation. E.g. as described above, we are claiming `context_words` cause the `gender_pronouns`. While causation follow direction of the arrows, statistical correlation can flow in any direction (it is cause-agnostic).
 
277
  In the case of this graph, any pink path between `context_words` and `gender_pronouns` will allow the flow of statistical correlation, inviting confounding and thus spurious correlations into the trained model.
278
 
279
  <center>
 
284
  Those familiar with causal DAGs may note when can simply condition on `gender` to block any confounding between the `context_words` and the `gender_pronouns`. However, this is not always possible, particularly in generative or mask-filling tasks where gender may be unknown, common in language modeling and in the demo below.
285
 
286
  <h2> How to use this demo </h2>
 
287
  In this demo, a user can add any sentence that contains at least one gender pronoun and the capitalized word `DATE`. We then sweep through a range of `date` values in the place of `DATE`, while masking (for prediction) the gender pronouns (included in the list below).
288
  ```
289
  gendered_lists = [
 
296
  ["husband", "wife"],
297
  ]
298
  ```
 
299
  In addition to choosing the test sentence, we ask that you pick how the fine-tuned model was trained:
300
  - conditioning variable: which, if any, conditioning variable from the three noted above in the DAG, was included in the text at train time.
301
  - loss function weight: weight assigned to the minority class (female pronouns in this fine-tuning dataset) that was included in the text at train time.
 
302
  You can also optionally pick a bert-like model for comparison.
303
 
 
304
 
305
+ Some notes:
306
+ - Gradio currently only supports 6 plotting colors (but there are [plans](https://github.com/gradio-app/gradio/issues/1088) to support more!), so best to not select too many models at once for now.
307
+ - If the dataframes appear to not update with new fields, it may help to 'Clear' the fields before 'Submitting' new inputs.
308
+
309
+
310
+ <h2> What are the results</h2>
311
  In the resulting plots, we can look for a dose-response relationship between:
312
  - our treatment: the sample text,
313
  - and our outcome: the predicted gender of pronouns in the text.
314
 
315
  Specifically, we are seeing if 1) making larger magnitude intervention: an older `DATE` in the text will, 2) result in a larger magnitude effect in the outcome: higher percentage of predicted female pronouns.
 
316
  Some trends that appear in the test sentences I have tried:
317
  - Conditioning on `birth_date` metadata in both training and inference text has the largest dose-response relationship. This seems reasonable, as the fine-tuned model is able to 'stratify' a learned relationship between gender pronouns and dates, when both are present in the text.
318
  - While conditioning on either no metadata or `birth_place` data training, have similar middle-ground effects for this inference task.
319
  - Finally, conditioning on `name` metadata in training, (while again conditioning on `date` in inference) has almost no dose-response relationship. It appears the learning of a `name —> gender pronouns` relationship was sufficiently successful to overwhelm any potential more nuanced learning, such as that driven by `birth_date` or `place`.
 
320
  Please feel free to ping me on the Hugging Face discord (I'm 'emily_learner' there), with any feedback/comments/concerns or interesting findings!
321
  """
322
 
 
324
  article = "Check out [main colab notebook](https://colab.research.google.com/drive/14ce4KD6PrCIL60Eng-t79tEI1UP-DHGz?usp=sharing#scrollTo=Mg1tUeHLRLaG) \
325
  with a lot more details about this method and implementation."
326
 
327
+ ceo_example = [
328
+ 20,
329
+ ["none", "birth_date", "name"],
330
+ FEMALE_WEIGHTS,
331
+ [],
332
+ 'Born in DATE, she was a CEO. Her work was greatly respected, and she was well-regarded in her field.',
333
+ ]
334
+
335
+ death_date_example = [
336
+ 10,
337
+ ['birth_date'],
338
+ [1.5],
339
+ BERT_LIKE_MODELS,
340
+ 'Died in DATE, she was reconigized for her great accomplishments to the field of teaching.'
341
+ ]
342
+
343
+ building_date_example = [
344
+ 30,
345
+ ['birth_date'],
346
+ [1.5],
347
+ BERT_LIKE_MODELS,
348
+ 'Built in DATE, her building provided the perfect environment for her job as a teacher.'
349
+ ]
350
+
351
  gr.Interface(
352
  fn=predict_gender_pronouns,
353
  inputs=[
 
402
  title=title,
403
  description=description,
404
  article=article,
405
+ examples=[ceo_example, death_date_example, building_date_example]
406
  ).launch()