Spaces:

esiga
/

simp_demo

Sleeping

App Files Files Community

Avijit Ghosh commited on Apr 6, 2024

Commit

0c7d699

1 Parent(s): 981ea1d

removed csv, added support for datasets

Browse files

Files changed (14) hide show

DemoData.csv +0 -26
Images/CrowsPairs2.png +0 -0
__pycache__/css.cpython-312.pyc +0 -0
app.py +8 -2
configs/crowspairs.yaml +3 -2
configs/homoglyphbias.yaml +1 -1
configs/honest.yaml +1 -1
configs/ieat.yaml +1 -1
configs/imagedataleak.yaml +1 -1
configs/notmyvoice.yaml +1 -1
configs/stablebias.yaml +1 -1
configs/stereoset.yaml +1 -1
configs/videodiversemisinfo.yaml +1 -1
configs/weat.yaml +1 -1

DemoData.csv DELETED Viewed

@@ -1,26 +0,0 @@
-Group,Modality,Type,Metaname,Suggested Evaluation,What it is evaluating,Considerations,Link,URL,Screenshots,Applicable Models ,Datasets,Hashtags,Abstract,Authors
-BiasEvals,Text,Model,weat,Word Embedding Association Test (WEAT),Associations and word embeddings based on Implicit Associations Test (IAT),"Although based in human associations, general societal attitudes do not always represent subgroups of people and cultures.",Semantics derived automatically from language corpora contain human-like biases,https://researchportal.bath.ac.uk/en/publications/semantics-derived-automatically-from-language-corpora-necessarily,"['Images/WEAT1.png', 'Images/WEAT2.png']",,,"['Bias', 'Word Association', 'Embeddings', 'NLP']","Artificial intelligence and machine learning are in a period of astounding growth. However, there are concerns that these
-technologies may be used, either with or without intention, to perpetuate the prejudice and unfairness that unfortunately
-characterizes many human institutions. Here we show for the first time that human-like semantic biases result from the
-application of standard machine learning to ordinary language—the same sort of language humans are exposed to every
-day. We replicate a spectrum of standard human biases as exposed by the Implicit Association Test and other well-known
-psychological studies. We replicate these using a widely used, purely statistical machine-learning model—namely, the GloVe
-word embedding—trained on a corpus of text from the Web. Our results indicate that language itself contains recoverable and
-accurate imprints of our historic biases, whether these are morally neutral as towards insects or flowers, problematic as towards
-race or gender, or even simply veridical, reflecting the status quo for the distribution of gender with respect to careers or first
-names. These regularities are captured by machine learning along with the rest of semantics. In addition to our empirical
-findings concerning language, we also contribute new methods for evaluating bias in text, the Word Embedding Association
-Test (WEAT) and the Word Embedding Factual Association Test (WEFAT). Our results have implications not only for AI and
-machine learning, but also for the fields of psychology, sociology, and human ethics, since they raise the possibility that mere
-exposure to everyday language can account for the biases we replicate here.","Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan"
-BiasEvals,Text,Dataset,stereoset,StereoSet,Protected class stereotypes,Automating stereotype detection makes distinguishing harmful stereotypes difficult. It also raises many false positives and can flag relatively neutral associations based in fact (e.g. population x has a high proportion of lactose intolerant people).,StereoSet: Measuring stereotypical bias in pretrained language models,https://arxiv.org/abs/2004.09456,,,,,,
-BiasEvals,Text,Dataset,crowspairs,Crow-S Pairs,Protected class stereotypes,Automating stereotype detection makes distinguishing harmful stereotypes difficult. It also raises many false positives and can flag relatively neutral associations based in fact (e.g. population x has a high proportion of lactose intolerant people).,CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models,https://arxiv.org/abs/2010.00133,,,,,,
-BiasEvals,Text,Output,honest,HONEST: Measuring Hurtful Sentence Completion in Language Models,Protected class stereotypes and hurtful language,Automating stereotype detection makes distinguishing harmful stereotypes difficult. It also raises many false positives and can flag relatively neutral associations based in fact (e.g. population x has a high proportion of lactose intolerant people).,HONEST: Measuring Hurtful Sentence Completion in Language Models,https://aclanthology.org/2021.naacl-main.191.pdf,,,,,,
-BiasEvals,Image,Model,ieat,Image Embedding Association Test (iEAT),Embedding associations,"Although based in human associations, general societal attitudes do not always represent subgroups of people and cultures.","Image Representations Learned With Unsupervised Pre-Training Contain Human-like Biases | Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency",https://dl.acm.org/doi/abs/10.1145/3442188.3445932,,,,,,
-BiasEvals,Image,Dataset,imagedataleak,Dataset leakage and model leakage,Gender and label bias,,Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations,https://arxiv.org/abs/1811.08489,,,,,,
-BiasEvals,Image,Output,stablebias,Characterizing the variation in generated images,,,Stable bias: Analyzing societal representations in diffusion models,https://arxiv.org/abs/2303.11408,,,,,,
-BiasEvals,Image,Output,homoglyphbias,Effect of different scripts on text-to-image generation,"It evaluates generated images for cultural stereotypes, when using different scripts (homoglyphs). It somewhat measures the suceptibility of a model to produce cultural stereotypes by simply switching the script",,Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis,https://arxiv.org/pdf/2209.08891.pdf,,,,,,
-BiasEvals,Audio,Taxonomy,notmyvoice,Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators,Lists harms of audio/speech generators,Not necessarily evaluation but a good source of taxonomy. We can use this to point readers towards high-level evaluations,Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators,https://arxiv.org/pdf/2402.01708.pdf,,,,,,
-BiasEvals,Video,Output,videodiversemisinfo,Diverse Misinformation: Impacts of Human Biases on Detection of Deepfakes on Networks,Human led evaluations of deepfakes to understand susceptibility and representational harms (including political violence),"Repr. harm, incite violence","Diverse Misinformation: Impacts of Human Biases on Detection of Deepfakes on Networks
-",https://arxiv.org/abs/2210.10026,,,,,,
-Privacy,,,,,,,,,,,,,,

Images/CrowsPairs2.png ADDED Viewed

__pycache__/css.cpython-312.pyc CHANGED Viewed

Binary files a/__pycache__/css.cpython-312.pyc and b/__pycache__/css.cpython-312.pyc differ

app.py CHANGED Viewed

@@ -55,6 +55,7 @@ def showmodal(evt: gr.SelectData):
     authormd = gr.Markdown("",visible=False)
     tagsmd = gr.Markdown("",visible=False)
     abstractmd = gr.Markdown("",visible=False)
     gallery = gr.Gallery([],visible=False)
     if evt.index[1] == 5:
         modal = Modal(visible=True)
@@ -73,13 +74,16 @@ def showmodal(evt: gr.SelectData):
         if pd.notnull(itemdic['Abstract']):
             abstractmd = gr.Markdown(itemdic['Abstract'],visible=True)
         screenshots = itemdic['Screenshots']
         if isinstance(screenshots, list):
             if len(screenshots) > 0:
                 gallery = gr.Gallery(screenshots, visible=True)
-    return [modal, titlemd, authormd, tagsmd, abstractmd, gallery]
 with gr.Blocks(title = "Social Impact Measurement V2", css=custom_css) as demo: #theme=gr.themes.Soft(),
     # create tabs for the app, moving the current table to one titled "rewardbench" and the benchmark_text to a tab called "About"
@@ -124,13 +128,15 @@ The following categories are high-level, non-exhaustive, and present a synthesis
                 modality_filter.change(filter_modality, inputs=[biastable_filtered, modality_filter], outputs=biastable_filtered)
                 type_filter.change(filter_type, inputs=[biastable_filtered, type_filter], outputs=biastable_filtered)
                 with Modal(visible=False) as modal:
                     titlemd = gr.Markdown(visible=False)
                     authormd = gr.Markdown(visible=False)
                     tagsmd = gr.Markdown(visible=False)
                     abstractmd = gr.Markdown(visible=False)
                     gallery = gr.Gallery(visible=False)
-                biastable_filtered.select(showmodal, None, [modal, titlemd, authormd, tagsmd, abstractmd, gallery])

     authormd = gr.Markdown("",visible=False)
     tagsmd = gr.Markdown("",visible=False)
     abstractmd = gr.Markdown("",visible=False)
+    datasetmd = gr.Markdown("",visible=False)
     gallery = gr.Gallery([],visible=False)
     if evt.index[1] == 5:
         modal = Modal(visible=True)
         if pd.notnull(itemdic['Abstract']):
             abstractmd = gr.Markdown(itemdic['Abstract'],visible=True)
+        if pd.notnull(itemdic['Datasets']):
+            datasetmd = gr.Markdown('#### [Dataset]('+itemdic['Datasets']+')',visible=True)
         screenshots = itemdic['Screenshots']
         if isinstance(screenshots, list):
             if len(screenshots) > 0:
                 gallery = gr.Gallery(screenshots, visible=True)
+    return [modal, titlemd, authormd, tagsmd, abstractmd, datasetmd, gallery]
 with gr.Blocks(title = "Social Impact Measurement V2", css=custom_css) as demo: #theme=gr.themes.Soft(),
     # create tabs for the app, moving the current table to one titled "rewardbench" and the benchmark_text to a tab called "About"
                 modality_filter.change(filter_modality, inputs=[biastable_filtered, modality_filter], outputs=biastable_filtered)
                 type_filter.change(filter_type, inputs=[biastable_filtered, type_filter], outputs=biastable_filtered)
                 with Modal(visible=False) as modal:
                     titlemd = gr.Markdown(visible=False)
                     authormd = gr.Markdown(visible=False)
                     tagsmd = gr.Markdown(visible=False)
                     abstractmd = gr.Markdown(visible=False)
+                    datasetmd = gr.Markdown(visible=False)
                     gallery = gr.Gallery(visible=False)
+                biastable_filtered.select(showmodal, None, [modal, titlemd, authormd, tagsmd, abstractmd, datasetmd, gallery])

configs/crowspairs.yaml CHANGED Viewed

@@ -1,10 +1,10 @@
 Abstract: "Pretrained language models, especially masked language models (MLMs) have seen success across many NLP tasks. However, there is ample evidence that they use the cultural biases that are undoubtedly present in the corpora they are trained on, implicitly creating harm with biased representations. To measure some forms of social bias in language models against protected demographic groups in the US, we introduce the Crowdsourced Stereotype Pairs benchmark (CrowS-Pairs). CrowS-Pairs has 1508 examples that cover stereotypes dealing with nine types of bias, like race, religion, and age. In CrowS-Pairs a model is presented with two sentences: one that is more stereotyping and another that is less stereotyping. The data focuses on stereotypes about historically disadvantaged groups and contrasts them with advantaged groups. We find that all three of the widely-used MLMs we evaluate substantially favor sentences that express stereotypes in every category in CrowS-Pairs. As work on building less biased models advances, this dataset can be used as a benchmark to evaluate progress."
-'Applicable Models ': .nan
 Authors: Nikita Nangia, Clara Vania, Rasika Bhalerao, Samuel R. Bowman
 Considerations: Automating stereotype detection makes distinguishing harmful stereotypes
   difficult. It also raises many false positives and can flag relatively neutral associations
   based in fact (e.g. population x has a high proportion of lactose intolerant people).
-Datasets: .nan
 Group: BiasEvals
 Hashtags: .nan
 Link: 'CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language
@@ -12,6 +12,7 @@ Link: 'CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked La
 Modality: Text
 Screenshots:
 - Images/CrowsPairs1.png
 Suggested Evaluation: Crow-S Pairs
 Type: Dataset
 URL: https://arxiv.org/abs/2010.00133

 Abstract: "Pretrained language models, especially masked language models (MLMs) have seen success across many NLP tasks. However, there is ample evidence that they use the cultural biases that are undoubtedly present in the corpora they are trained on, implicitly creating harm with biased representations. To measure some forms of social bias in language models against protected demographic groups in the US, we introduce the Crowdsourced Stereotype Pairs benchmark (CrowS-Pairs). CrowS-Pairs has 1508 examples that cover stereotypes dealing with nine types of bias, like race, religion, and age. In CrowS-Pairs a model is presented with two sentences: one that is more stereotyping and another that is less stereotyping. The data focuses on stereotypes about historically disadvantaged groups and contrasts them with advantaged groups. We find that all three of the widely-used MLMs we evaluate substantially favor sentences that express stereotypes in every category in CrowS-Pairs. As work on building less biased models advances, this dataset can be used as a benchmark to evaluate progress."
+Applicable Models: .nan
 Authors: Nikita Nangia, Clara Vania, Rasika Bhalerao, Samuel R. Bowman
 Considerations: Automating stereotype detection makes distinguishing harmful stereotypes
   difficult. It also raises many false positives and can flag relatively neutral associations
   based in fact (e.g. population x has a high proportion of lactose intolerant people).
+Datasets: https://huggingface.co/datasets/crows_pairs
 Group: BiasEvals
 Hashtags: .nan
 Link: 'CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language
 Modality: Text
 Screenshots:
 - Images/CrowsPairs1.png
+- Images/CrowsPairs2.png
 Suggested Evaluation: Crow-S Pairs
 Type: Dataset
 URL: https://arxiv.org/abs/2010.00133

configs/homoglyphbias.yaml CHANGED Viewed

@@ -1,5 +1,5 @@
 Abstract: .nan
-'Applicable Models ': .nan
 Authors: .nan
 Considerations: .nan
 Datasets: .nan

 Abstract: .nan
+Applicable Models: .nan
 Authors: .nan
 Considerations: .nan
 Datasets: .nan

configs/honest.yaml CHANGED Viewed

@@ -1,5 +1,5 @@
 Abstract: .nan
-'Applicable Models ': .nan
 Authors: .nan
 Considerations: Automating stereotype detection makes distinguishing harmful stereotypes
   difficult. It also raises many false positives and can flag relatively neutral associations

 Abstract: .nan
+Applicable Models: .nan
 Authors: .nan
 Considerations: Automating stereotype detection makes distinguishing harmful stereotypes
   difficult. It also raises many false positives and can flag relatively neutral associations

configs/ieat.yaml CHANGED Viewed

@@ -1,5 +1,5 @@
 Abstract: .nan
-'Applicable Models ': .nan
 Authors: .nan
 Considerations: Although based in human associations, general societal attitudes do
   not always represent subgroups of people and cultures.

 Abstract: .nan
+Applicable Models: .nan
 Authors: .nan
 Considerations: Although based in human associations, general societal attitudes do
   not always represent subgroups of people and cultures.

configs/imagedataleak.yaml CHANGED Viewed

@@ -1,5 +1,5 @@
 Abstract: .nan
-'Applicable Models ': .nan
 Authors: .nan
 Considerations: .nan
 Datasets: .nan

 Abstract: .nan
+Applicable Models: .nan
 Authors: .nan
 Considerations: .nan
 Datasets: .nan

configs/notmyvoice.yaml CHANGED Viewed

@@ -1,5 +1,5 @@
 Abstract: .nan
-'Applicable Models ': .nan
 Authors: .nan
 Considerations: Not necessarily evaluation but a good source of taxonomy. We can use
   this to point readers towards high-level evaluations

 Abstract: .nan
+Applicable Models: .nan
 Authors: .nan
 Considerations: Not necessarily evaluation but a good source of taxonomy. We can use
   this to point readers towards high-level evaluations

configs/stablebias.yaml CHANGED Viewed

@@ -1,5 +1,5 @@
 Abstract: .nan
-'Applicable Models ': .nan
 Authors: .nan
 Considerations: .nan
 Datasets: .nan

 Abstract: .nan
+Applicable Models: .nan
 Authors: .nan
 Considerations: .nan
 Datasets: .nan

configs/stereoset.yaml CHANGED Viewed

@@ -1,5 +1,5 @@
 Abstract: .nan
-'Applicable Models ': .nan
 Authors: .nan
 Considerations: Automating stereotype detection makes distinguishing harmful stereotypes
   difficult. It also raises many false positives and can flag relatively neutral associations

 Abstract: .nan
+Applicable Models: .nan
 Authors: .nan
 Considerations: Automating stereotype detection makes distinguishing harmful stereotypes
   difficult. It also raises many false positives and can flag relatively neutral associations

configs/videodiversemisinfo.yaml CHANGED Viewed

@@ -1,5 +1,5 @@
 Abstract: .nan
-'Applicable Models ': .nan
 Authors: .nan
 Considerations: Repr. harm, incite violence
 Datasets: .nan

 Abstract: .nan
+Applicable Models: .nan
 Authors: .nan
 Considerations: Repr. harm, incite violence
 Datasets: .nan

configs/weat.yaml CHANGED Viewed

@@ -19,7 +19,7 @@ Abstract: "Artificial intelligence and machine learning are in a period of astou
   machine learning, but also for the fields of psychology, sociology, and human ethics,\
   \ since they raise the possibility that mere\nexposure to everyday language can\
   \ account for the biases we replicate here."
-'Applicable Models ': .nan
 Authors: Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan
 Considerations: Although based in human associations, general societal attitudes do
   not always represent subgroups of people and cultures.

   machine learning, but also for the fields of psychology, sociology, and human ethics,\
   \ since they raise the possibility that mere\nexposure to everyday language can\
   \ account for the biases we replicate here."
+Applicable Models: .nan
 Authors: Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan
 Considerations: Although based in human associations, general societal attitudes do
   not always represent subgroups of people and cultures.