Avijit Ghosh commited on
Commit
b0965ee
1 Parent(s): 0d25da0

added yaml fields to all files

Browse files
configs/crowspairs.yaml CHANGED
@@ -17,4 +17,6 @@ Suggested Evaluation: Crow-S Pairs
17
  Level: Dataset
18
  URL: https://arxiv.org/abs/2010.00133
19
  What it is evaluating: Protected class stereotypes
20
- Metrics: .nan
 
 
 
17
  Level: Dataset
18
  URL: https://arxiv.org/abs/2010.00133
19
  What it is evaluating: Protected class stereotypes
20
+ Metrics: .nan
21
+ Affiliations: .nan
22
+ Methodology: .nan
configs/honest.yaml CHANGED
@@ -15,3 +15,5 @@ Level: Output
15
  URL: https://aclanthology.org/2021.naacl-main.191.pdf
16
  What it is evaluating: Protected class stereotypes and hurtful language
17
  Metrics: .nan
 
 
 
15
  URL: https://aclanthology.org/2021.naacl-main.191.pdf
16
  What it is evaluating: Protected class stereotypes and hurtful language
17
  Metrics: .nan
18
+ Affiliations: .nan
19
+ Methodology: .nan
configs/ieat.yaml CHANGED
@@ -14,4 +14,6 @@ Suggested Evaluation: Image Embedding Association Test (iEAT)
14
  Level: Model
15
  URL: https://dl.acm.org/doi/abs/10.1145/3442188.3445932
16
  What it is evaluating: Embedding associations
17
- Metrics: .nan
 
 
 
14
  Level: Model
15
  URL: https://dl.acm.org/doi/abs/10.1145/3442188.3445932
16
  What it is evaluating: Embedding associations
17
+ Metrics: .nan
18
+ Affiliations: .nan
19
+ Methodology: .nan
configs/imagedataleak.yaml CHANGED
@@ -13,4 +13,6 @@ Suggested Evaluation: Dataset leakage and model leakage
13
  Level: Dataset
14
  URL: https://arxiv.org/abs/1811.08489
15
  What it is evaluating: Gender and label bias
16
- Metrics: .nan
 
 
 
13
  Level: Dataset
14
  URL: https://arxiv.org/abs/1811.08489
15
  What it is evaluating: Gender and label bias
16
+ Metrics: .nan
17
+ Affiliations: .nan
18
+ Methodology: .nan
configs/measuringforgetting.yaml CHANGED
@@ -17,4 +17,6 @@ Suggested Evaluation: Measuring forgetting of training examples
17
  Level: Model
18
  URL: https://arxiv.org/pdf/2207.00099.pdf
19
  What it is evaluating: Measure whether models forget training examples over time, over different types of models (image, audio, text) and how order of training affects privacy attacks
20
- Metrics: .nan
 
 
 
17
  Level: Model
18
  URL: https://arxiv.org/pdf/2207.00099.pdf
19
  What it is evaluating: Measure whether models forget training examples over time, over different types of models (image, audio, text) and how order of training affects privacy attacks
20
+ Metrics: .nan
21
+ Affiliations: .nan
22
+ Methodology: .nan
configs/notmyvoice.yaml CHANGED
@@ -14,4 +14,6 @@ Suggested Evaluation: Not My Voice! A Taxonomy of Ethical and Safety Harms of Sp
14
  Level: Taxonomy
15
  URL: https://arxiv.org/pdf/2402.01708.pdf
16
  What it is evaluating: Lists harms of audio/speech generators
17
- Metrics: .nan
 
 
 
14
  Level: Taxonomy
15
  URL: https://arxiv.org/pdf/2402.01708.pdf
16
  What it is evaluating: Lists harms of audio/speech generators
17
+ Metrics: .nan
18
+ Affiliations: .nan
19
+ Methodology: .nan
configs/palms.yaml CHANGED
@@ -12,4 +12,6 @@ Suggested Evaluation: Human and Toxicity Evals of Cultural Value Categories
12
  Level: Output
13
  URL: http://arxiv.org/abs/2106.10328
14
  What it is evaluating: Adherence to defined norms for a set of cultural categories
15
- Metrics: .nan
 
 
 
12
  Level: Output
13
  URL: http://arxiv.org/abs/2106.10328
14
  What it is evaluating: Adherence to defined norms for a set of cultural categories
15
+ Metrics: .nan
16
+ Affiliations: .nan
17
+ Methodology: .nan
configs/safelatentdiff.yaml CHANGED
@@ -15,4 +15,6 @@ Suggested Evaluation: Evaluating text-to-image models for safety
15
  Level: Output
16
  URL: https://arxiv.org/pdf/2211.05105.pdf
17
  What it is evaluating: Generating images for diverse set of prompts (novel I2P benchmark) and investigating how often e.g. violent/nude images will be generated. There is a distinction between implicit and explicit safety, i.e. unsafe results with “normal” prompts.
18
- Metrics: .nan
 
 
 
15
  Level: Output
16
  URL: https://arxiv.org/pdf/2211.05105.pdf
17
  What it is evaluating: Generating images for diverse set of prompts (novel I2P benchmark) and investigating how often e.g. violent/nude images will be generated. There is a distinction between implicit and explicit safety, i.e. unsafe results with “normal” prompts.
18
+ Metrics: .nan
19
+ Affiliations: .nan
20
+ Methodology: .nan
configs/stablebias.yaml CHANGED
@@ -13,3 +13,5 @@ Level: Output
13
  URL: https://arxiv.org/abs/2303.11408
14
  What it is evaluating: .nan
15
  Metrics: .nan
 
 
 
13
  URL: https://arxiv.org/abs/2303.11408
14
  What it is evaluating: .nan
15
  Metrics: .nan
16
+ Affiliations: .nan
17
+ Methodology: .nan
configs/tango.yaml CHANGED
@@ -17,4 +17,6 @@ Suggested Evaluation: Human and Toxicity Evals of Cultural Value Categories
17
  Level: Output
18
  URL: http://arxiv.org/abs/2106.10328
19
  What it is evaluating: Bias measurement for trans and nonbinary community via measuring gender non-affirmative language, specifically 1) misgendering 2), negative responses to gender disclosure
20
- Metrics: .nan
 
 
 
17
  Level: Output
18
  URL: http://arxiv.org/abs/2106.10328
19
  What it is evaluating: Bias measurement for trans and nonbinary community via measuring gender non-affirmative language, specifically 1) misgendering 2), negative responses to gender disclosure
20
+ Metrics: .nan
21
+ Affiliations: .nan
22
+ Methodology: .nan
configs/videodiversemisinfo.yaml CHANGED
@@ -14,4 +14,6 @@ Level: Output
14
  URL: https://arxiv.org/abs/2210.10026
15
  What it is evaluating: Human led evaluations of deepfakes to understand susceptibility
16
  and representational harms (including political violence)
17
- Metrics: .nan
 
 
 
14
  URL: https://arxiv.org/abs/2210.10026
15
  What it is evaluating: Human led evaluations of deepfakes to understand susceptibility
16
  and representational harms (including political violence)
17
+ Metrics: .nan
18
+ Affiliations: .nan
19
+ Methodology: .nan