File size: 13,531 Bytes
ac25a9d
80bef93
ac25a9d
 
 
fd6efb2
80bef93
ac25a9d
80bef93
ac25a9d
b1a18ac
 
 
 
ac25a9d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fd6efb2
 
ac25a9d
fd6efb2
ac25a9d
 
 
 
 
 
 
 
 
fd6efb2
 
 
 
ac25a9d
fd6efb2
ac25a9d
 
fd6efb2
ac25a9d
 
fd6efb2
ac25a9d
 
 
fd6efb2
ac25a9d
 
 
fd6efb2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37d96b6
 
 
fd6efb2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80bef93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fd6efb2
 
 
 
 
37d96b6
fd6efb2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37d96b6
fd6efb2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
404320b
ac25a9d
 
 
fd6efb2
 
 
 
 
 
 
 
 
ac25a9d
fd6efb2
ac25a9d
fd6efb2
 
 
 
 
 
 
ac25a9d
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta name="description"
        content="Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI">
  <meta name="keywords" content="Generative AI, Evaluation, Social Impact, NeurIPS, Workshop, AI Ethics">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>Evaluating Evaluations: NeurIPS Workshop 2024</title>

  <link rel="preconnect" href="https://fonts.googleapis.com">
  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
  <link href="https://fonts.googleapis.com/css2?family=Libre+Franklin:wght@400;600&display=swap" rel="stylesheet">

  <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
        rel="stylesheet">

  <link rel="stylesheet" href="./static/css/bulma.min.css">
  <link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
  <link rel="stylesheet" href="./static/css/bulma-slider.min.css">
  <link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
  <link rel="stylesheet"
        href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
  <link rel="stylesheet" href="./static/css/index.css">
  <link rel="icon" href="./static/images/favicon.svg">

  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
  <script defer src="./static/js/fontawesome.all.min.js"></script>
  <script src="./static/js/bulma-carousel.min.js"></script>
  <script src="./static/js/bulma-slider.min.js"></script>
  <script src="./static/js/index.js"></script>
</head>
<body>

<section class="hero">
  <div class="hero-body">
    <div class="container is-max-desktop">
      <div class="columns is-centered">
        <div class="column has-text-centered">
          <h1 class="title is-1 publication-title">Evaluating Evaluations (2024)</h1>
          <h2 class="subtitle is-3 publication-subtitle">Examining Best Practices for Measuring Broader Impacts of Generative AI</h2>
          <div class="is-size-5 publication-authors">
            <span class="author-block">A NeurIPS Workshop</span>
          </div>
        </div>
      </div>
    </div>
  </div>
</section>

<section class="section">
  <div class="container is-max-desktop">
    <div class="columns is-centered has-text-centered">
      <div class="column is-four-fifths">
        <h2 class="title is-3">Workshop Overview</h2>
        <div class="content has-text-justified">
          <p>
            Generative AI systems are becoming increasingly prevalent in society, producing content such as text, images, audio, and video with far-reaching implications. While the NeurIPS Broader Impact statement has notably shifted norms for AI publications to consider negative societal impact, no standard exists for how to approach these impact assessments. This workshop aims to address this critical gap by bringing together experts on evaluation science and practitioners who develop and analyze technical systems.
          </p>
          <p>
            Building upon our previous initiatives, including the FAccT 2023 CRAFT session "Assessing the Impacts of Generative AI Systems Across Modalities and Society" and our initial "Evaluating the Social Impact of Generative AI Systems" report, we have made significant strides in this area. Through these efforts, we collaboratively developed an evaluation framework and guidance for assessing generative systems across modalities. We have since crowdsourced evaluations and analyzed gaps in literature and systemic issues around how evaluations are designed and selected.
          </p>
          <p>
            The goal of this workshop is to share our existing findings with the NeurIPS community and collectively develop future directions for effective community-built evaluations. By fostering collaboration between experts and practitioners, we aim to create more comprehensive evaluations and develop urgently needed policy recommendations for governments and AI safety organizations.
          </p>
        </div>
      </div>
    </div>
  </div>
</section>

<section class="section">
  <div class="container is-max-desktop">
    <h2 class="title is-3">Call for Papers (CFP)</h2>
    <div class="content has-text-justified">
      <p>We are soliciting tiny papers (up to 2 pages long) in the following formats:</p>
      <ol>
        <li>Extended Abstracts: Short but complete research papers presenting original or interesting results around social impact evaluation for generative AI.</li>
        <li>"Provocations": Novel perspectives or challenges to conventional wisdom around social impact evaluation for generative AI.</li>
      </ol>
      <h3 class="title is-4">Submission Guidelines</h3>
      <ul>
        <li>Paper Length: Maximum 2 pages, including references</li>
        <li>Format: PDF file, using the NeurIPS conference format</li>
        <li>Submission Portal: [Insert submission portal link here]</li>
        <li>Anonymity: Submissions should be anonymous for blind review</li>
      </ul>
      <h3 class="title is-4">Themes for Submissions</h3>
      <p>We welcome submissions addressing, but not limited to, the following themes:</p>
      <ol>
        <li>Conceptualization and operationalization issues in evaluations of:
          <ul>
            <li>Bias, stereotypes, and representational harms</li>
            <li>Cultural values and sensitive content</li>
            <li>Community-centered definitions of disparate performance and privacy</li>
            <li>Documentation frameworks for financial and environmental costs of evaluations</li>
          </ul>
        </li>
        <li>Ethical or consequential validity considerations for:
          <ul>
            <li>Data protection</li>
            <li>Data and content moderation labor</li>
            <li>Historical implications of evaluation data or practices for evaluation validity</li>
          </ul>
        </li>
        <li>Interrogating or critiquing the theoretical basis of existing evaluations</li>
        <li>Novel methodologies for evaluating social impact across different AI modalities</li>
        <li>Comparative analyses of existing evaluation frameworks and their effectiveness</li>
        <li>Case studies of social impact evaluations in real-world AI applications</li>
      </ol>
      <h3 class="title is-4">Important Dates</h3>
      <ul>
        <li>Submission Deadline: August 1, 2024</li>
        <li>Notification of Acceptance: September 1, 2024</li>
        <li>Workshop Date: [Insert workshop date here]</li>
      </ul>
    </div>
  </div>
</section>

<section class="section">
  <div class="container is-max-desktop">
    <h2 class="title is-3">Workshop Structure</h2>
    <div class="content">
      <p>Total Duration: 8 Hours</p>
      <table class="table is-fullwidth">
        <thead>
          <tr>
            <th>Time</th>
            <th>Session</th>
            <th>Description</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>9:00 AM - 9:30 AM</td>
            <td>Welcome and Introduction</td>
            <td>
              <ul>
                <li>Opening remarks</li>
                <li>Overview of workshop structure and objectives</li>
              </ul>
            </td>
          </tr>
          <tr>
            <td>9:30 AM - 11:00 AM</td>
            <td>Reflections on the Landscape</td>
            <td>
              <ul>
                <li>Collaborative reflection on the existing landscape</li>
                <li>Talks, panels, and breakouts by modality (text, images, audio, video, and multimodal data)</li>
                <li>Topics: Underlying frameworks, Contextualization challenges, Defining robust evaluations, Incentive structures</li>
              </ul>
            </td>
          </tr>
          <tr>
            <td>11:00 AM - 11:15 AM</td>
            <td>Break</td>
            <td></td>
          </tr>
          <tr>
            <td>11:15 AM - 12:45 PM</td>
            <td>Talks + Provocations</td>
            <td>
              <ul>
                <li>Invited speakers present on current technical evaluations for base models across all modalities</li>
                <li>Key social impact categories covered: Bias and stereotyping, Cultural values, Performance disparities, Privacy, Financial and environmental costs, Data moderator labor</li>
                <li>Presentations of accepted provocations</li>
              </ul>
            </td>
          </tr>
          <tr>
            <td>12:45 PM - 1:45 PM</td>
            <td>Lunch Break</td>
            <td></td>
          </tr>
          <tr>
            <td>1:45 PM - 3:45 PM</td>
            <td>Group Activity</td>
            <td>
              <ul>
                <li>Participants break into groups focusing on key social impact categories</li>
                <li>Activities include: Choosing Evaluations, Reviewing Tools and Datasets, Examining construct reliability, validity, and ranking methodologies</li>
              </ul>
            </td>
          </tr>
          <tr>
            <td>3:45 PM - 4:00 PM</td>
            <td>Break</td>
            <td></td>
          </tr>
          <tr>
            <td>4:00 PM - 5:45 PM</td>
            <td>What's Next? Documentation + Resources</td>
            <td>
              <ul>
                <li>Develop policy guidance highlighting impact categories, subcategories, and modalities requiring further investment</li>
                <li>Discussions on: Documenting Methods, Developing Shareable Resources, Underlying Frameworks, Contextualization Challenges, Defining Robust Evaluations</li>
              </ul>
            </td>
          </tr>
          <tr>
            <td>5:45 PM - 6:00 PM</td>
            <td>Closing Remarks</td>
            <td></td>
          </tr>
        </tbody>
      </table>
    </div>
  </div>
</section>

<section class="section">
  <div class="container is-max-desktop">
    <h2 class="title is-3">Invited Speakers</h2>
    <div class="content">
      <h3 class="title is-4">Confirmed Speakers:</h3>
      <ol>
        <li>
          <strong>Abigail Jacobs</strong>
          <ul>
            <li>Assistant Professor, School of Information</li>
            <li>Assistant Professor of Complex Systems, College of Literature, Science, and the Arts</li>
            <li>University of Michigan</li>
          </ul>
        </li>
        <li>
          <strong>Nitarshan Rajkumar</strong>
          <ul>
            <li>Cofounder of UK AI Safety Institute</li>
            <li>Adviser to the Secretary of State of UK Department for Science, Innovation and Technology</li>
          </ul>
        </li>
        <li>
          <strong>Su Lin Blodgett</strong>
          <ul>
            <li>Senior Researcher, Microsoft Research Montreal</li>
          </ul>
        </li>
      </ol>
      <h3 class="title is-4">Tentative Speaker:</h3>
      <ol start="4">
        <li>
          <strong>Abeba Birhane</strong>
          <ul>
            <li>Adjunct Lecturer/Assistant Professor, Trinity College Dublin</li>
            <li>Senior Fellow in Trustworthy AI at Mozilla Foundation</li>
          </ul>
        </li>
      </ol>
    </div>
  </div>
</section>

<section class="section">
  <div class="container is-max-desktop">
    <h2 class="title is-3">Expected Outcomes</h2>
    <div class="content has-text-justified">
      <p>Three months after the workshop, we aim to achieve the following outcomes:</p>
      <ol>
        <li>
          <strong>Evaluation Report and Resources/Repository:</strong>
          <ul>
            <li>Publish a comprehensive summary of the workshop findings</li>
            <li>Update resources including:
              <ul>
                <li>Documentation framework for standardizing evaluation practices</li>
                <li>Open source repository addressing identified barriers to broader adoption of social impact evaluation of Generative AI systems</li>
              </ul>
            </li>
          </ul>
        </li>
        <li>
          <strong>Policy Recommendations:</strong>
          <ul>
            <li>Share detailed policy recommendations for investment in future directions for social impact evaluations based on group discussions and workshop outcomes</li>
          </ul>
        </li>
        <li>
          <strong>Knowledge Sharing:</strong>
          <ul>
            <li>Foster a more systematic and effective approach to evaluating the social impact of generative AI systems by disseminating lessons and findings to the broader AI research community</li>
          </ul>
        </li>
      </ol>
    </div>
  </div>
</section>

<section class="section">
  <div class="container is-max-desktop">
    <h2 class="title is-3">Contact Information</h2>
    <div class="content has-text-justified">
      <p>For any queries regarding the workshop or submission process, please contact:</p>
      <p>[Insert contact information for workshop organizers]</p>
    </div>
  </div>
</section>

<footer class="footer">
  <div class="container">
    <div class="content has-text-centered">
      <p>
        Workshop on Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI
      </p>
      <p>
        Website template borrowed from the <a href="https://github.com/nerfies/nerfies.github.io">nerfies</a> project page.
      </p>
    </div>
  </div>
</footer>

</body>
</html>