EvalEval2024

Running

App Files Files Community

EvalEval2024 / index.html

evijit HF staff

Update index.html

80bef93 verified 3 days ago

raw

history blame

No virus

13.5 kB

	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="utf-8">
	<meta name="description"
	content="Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI">
	<meta name="keywords" content="Generative AI, Evaluation, Social Impact, NeurIPS, Workshop, AI Ethics">
	<meta name="viewport" content="width=device-width, initial-scale=1">
	<title>Evaluating Evaluations: NeurIPS Workshop 2024</title>

	<link rel="preconnect" href="https://fonts.googleapis.com">
	<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
	<link href="https://fonts.googleapis.com/css2?family=Libre+Franklin:wght@400;600&display=swap" rel="stylesheet">

	<link href="https://fonts.googleapis.com/css?family=Google+Sans\|Noto+Sans\|Castoro"
	rel="stylesheet">

	<link rel="stylesheet" href="./static/css/bulma.min.css">
	<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
	<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
	<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
	<link rel="stylesheet"
	href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
	<link rel="stylesheet" href="./static/css/index.css">
	<link rel="icon" href="./static/images/favicon.svg">

	<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
	<script defer src="./static/js/fontawesome.all.min.js"></script>
	<script src="./static/js/bulma-carousel.min.js"></script>
	<script src="./static/js/bulma-slider.min.js"></script>
	<script src="./static/js/index.js"></script>
	</head>
	<body>

	<section class="hero">
	<div class="hero-body">
	<div class="container is-max-desktop">
	<div class="columns is-centered">
	<div class="column has-text-centered">
	<h1 class="title is-1 publication-title">Evaluating Evaluations (2024)</h1>
	<h2 class="subtitle is-3 publication-subtitle">Examining Best Practices for Measuring Broader Impacts of Generative AI</h2>
	<div class="is-size-5 publication-authors">
	<span class="author-block">A NeurIPS Workshop</span>
	</div>
	</div>
	</div>
	</div>
	</div>
	</section>

	<section class="section">
	<div class="container is-max-desktop">
	<div class="columns is-centered has-text-centered">
	<div class="column is-four-fifths">
	<h2 class="title is-3">Workshop Overview</h2>
	<div class="content has-text-justified">
	<p>
	Generative AI systems are becoming increasingly prevalent in society, producing content such as text, images, audio, and video with far-reaching implications. While the NeurIPS Broader Impact statement has notably shifted norms for AI publications to consider negative societal impact, no standard exists for how to approach these impact assessments. This workshop aims to address this critical gap by bringing together experts on evaluation science and practitioners who develop and analyze technical systems.
	</p>
	<p>
	Building upon our previous initiatives, including the FAccT 2023 CRAFT session "Assessing the Impacts of Generative AI Systems Across Modalities and Society" and our initial "Evaluating the Social Impact of Generative AI Systems" report, we have made significant strides in this area. Through these efforts, we collaboratively developed an evaluation framework and guidance for assessing generative systems across modalities. We have since crowdsourced evaluations and analyzed gaps in literature and systemic issues around how evaluations are designed and selected.
	</p>
	<p>
	The goal of this workshop is to share our existing findings with the NeurIPS community and collectively develop future directions for effective community-built evaluations. By fostering collaboration between experts and practitioners, we aim to create more comprehensive evaluations and develop urgently needed policy recommendations for governments and AI safety organizations.
	</p>
	</div>
	</div>
	</div>
	</div>
	</section>

	<section class="section">
	<div class="container is-max-desktop">
	<h2 class="title is-3">Call for Papers (CFP)</h2>
	<div class="content has-text-justified">
	<p>We are soliciting tiny papers (up to 2 pages long) in the following formats:</p>
	<ol>
	<li>Extended Abstracts: Short but complete research papers presenting original or interesting results around social impact evaluation for generative AI.</li>
	<li>"Provocations": Novel perspectives or challenges to conventional wisdom around social impact evaluation for generative AI.</li>
	</ol>
	<h3 class="title is-4">Submission Guidelines</h3>
	<ul>
	<li>Paper Length: Maximum 2 pages, including references</li>
	<li>Format: PDF file, using the NeurIPS conference format</li>
	<li>Submission Portal: [Insert submission portal link here]</li>
	<li>Anonymity: Submissions should be anonymous for blind review</li>
	</ul>
	<h3 class="title is-4">Themes for Submissions</h3>
	<p>We welcome submissions addressing, but not limited to, the following themes:</p>
	<ol>
	<li>Conceptualization and operationalization issues in evaluations of:
	<ul>
	<li>Bias, stereotypes, and representational harms</li>
	<li>Cultural values and sensitive content</li>
	<li>Community-centered definitions of disparate performance and privacy</li>
	<li>Documentation frameworks for financial and environmental costs of evaluations</li>
	</ul>
	</li>
	<li>Ethical or consequential validity considerations for:
	<ul>
	<li>Data protection</li>
	<li>Data and content moderation labor</li>
	<li>Historical implications of evaluation data or practices for evaluation validity</li>
	</ul>
	</li>
	<li>Interrogating or critiquing the theoretical basis of existing evaluations</li>
	<li>Novel methodologies for evaluating social impact across different AI modalities</li>
	<li>Comparative analyses of existing evaluation frameworks and their effectiveness</li>
	<li>Case studies of social impact evaluations in real-world AI applications</li>
	</ol>
	<h3 class="title is-4">Important Dates</h3>
	<ul>
	<li>Submission Deadline: August 1, 2024</li>
	<li>Notification of Acceptance: September 1, 2024</li>
	<li>Workshop Date: [Insert workshop date here]</li>
	</ul>
	</div>
	</div>
	</section>

	<section class="section">
	<div class="container is-max-desktop">
	<h2 class="title is-3">Workshop Structure</h2>
	<div class="content">
	<p>Total Duration: 8 Hours</p>
	<table class="table is-fullwidth">
	<thead>
	<tr>
	<th>Time</th>
	<th>Session</th>
	<th>Description</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>9:00 AM - 9:30 AM</td>
	<td>Welcome and Introduction</td>
	<td>
	<ul>
	<li>Opening remarks</li>
	<li>Overview of workshop structure and objectives</li>
	</ul>
	</td>
	</tr>
	<tr>
	<td>9:30 AM - 11:00 AM</td>
	<td>Reflections on the Landscape</td>
	<td>
	<ul>
	<li>Collaborative reflection on the existing landscape</li>
	<li>Talks, panels, and breakouts by modality (text, images, audio, video, and multimodal data)</li>
	<li>Topics: Underlying frameworks, Contextualization challenges, Defining robust evaluations, Incentive structures</li>
	</ul>
	</td>
	</tr>
	<tr>
	<td>11:00 AM - 11:15 AM</td>
	<td>Break</td>
	<td></td>
	</tr>
	<tr>
	<td>11:15 AM - 12:45 PM</td>
	<td>Talks + Provocations</td>
	<td>
	<ul>
	<li>Invited speakers present on current technical evaluations for base models across all modalities</li>
	<li>Key social impact categories covered: Bias and stereotyping, Cultural values, Performance disparities, Privacy, Financial and environmental costs, Data moderator labor</li>
	<li>Presentations of accepted provocations</li>
	</ul>
	</td>
	</tr>
	<tr>
	<td>12:45 PM - 1:45 PM</td>
	<td>Lunch Break</td>
	<td></td>
	</tr>
	<tr>
	<td>1:45 PM - 3:45 PM</td>
	<td>Group Activity</td>
	<td>
	<ul>
	<li>Participants break into groups focusing on key social impact categories</li>
	<li>Activities include: Choosing Evaluations, Reviewing Tools and Datasets, Examining construct reliability, validity, and ranking methodologies</li>
	</ul>
	</td>
	</tr>
	<tr>
	<td>3:45 PM - 4:00 PM</td>
	<td>Break</td>
	<td></td>
	</tr>
	<tr>
	<td>4:00 PM - 5:45 PM</td>
	<td>What's Next? Documentation + Resources</td>
	<td>
	<ul>
	<li>Develop policy guidance highlighting impact categories, subcategories, and modalities requiring further investment</li>
	<li>Discussions on: Documenting Methods, Developing Shareable Resources, Underlying Frameworks, Contextualization Challenges, Defining Robust Evaluations</li>
	</ul>
	</td>
	</tr>
	<tr>
	<td>5:45 PM - 6:00 PM</td>
	<td>Closing Remarks</td>
	<td></td>
	</tr>
	</tbody>
	</table>
	</div>
	</div>
	</section>

	<section class="section">
	<div class="container is-max-desktop">
	<h2 class="title is-3">Invited Speakers</h2>
	<div class="content">
	<h3 class="title is-4">Confirmed Speakers:</h3>
	<ol>
	<li>
	<strong>Abigail Jacobs</strong>
	<ul>
	<li>Assistant Professor, School of Information</li>
	<li>Assistant Professor of Complex Systems, College of Literature, Science, and the Arts</li>
	<li>University of Michigan</li>
	</ul>
	</li>
	<li>
	<strong>Nitarshan Rajkumar</strong>
	<ul>
	<li>Cofounder of UK AI Safety Institute</li>
	<li>Adviser to the Secretary of State of UK Department for Science, Innovation and Technology</li>
	</ul>
	</li>
	<li>
	<strong>Su Lin Blodgett</strong>
	<ul>
	<li>Senior Researcher, Microsoft Research Montreal</li>
	</ul>
	</li>
	</ol>
	<h3 class="title is-4">Tentative Speaker:</h3>
	<ol start="4">
	<li>
	<strong>Abeba Birhane</strong>
	<ul>
	<li>Adjunct Lecturer/Assistant Professor, Trinity College Dublin</li>
	<li>Senior Fellow in Trustworthy AI at Mozilla Foundation</li>
	</ul>
	</li>
	</ol>
	</div>
	</div>
	</section>

	<section class="section">
	<div class="container is-max-desktop">
	<h2 class="title is-3">Expected Outcomes</h2>
	<div class="content has-text-justified">
	<p>Three months after the workshop, we aim to achieve the following outcomes:</p>
	<ol>
	<li>
	<strong>Evaluation Report and Resources/Repository:</strong>
	<ul>
	<li>Publish a comprehensive summary of the workshop findings</li>
	<li>Update resources including:
	<ul>
	<li>Documentation framework for standardizing evaluation practices</li>
	<li>Open source repository addressing identified barriers to broader adoption of social impact evaluation of Generative AI systems</li>
	</ul>
	</li>
	</ul>
	</li>
	<li>
	<strong>Policy Recommendations:</strong>
	<ul>
	<li>Share detailed policy recommendations for investment in future directions for social impact evaluations based on group discussions and workshop outcomes</li>
	</ul>
	</li>
	<li>
	<strong>Knowledge Sharing:</strong>
	<ul>
	<li>Foster a more systematic and effective approach to evaluating the social impact of generative AI systems by disseminating lessons and findings to the broader AI research community</li>
	</ul>
	</li>
	</ol>
	</div>
	</div>
	</section>

	<section class="section">
	<div class="container is-max-desktop">
	<h2 class="title is-3">Contact Information</h2>
	<div class="content has-text-justified">
	<p>For any queries regarding the workshop or submission process, please contact:</p>
	<p>[Insert contact information for workshop organizers]</p>
	</div>
	</div>
	</section>

	<footer class="footer">
	<div class="container">
	<div class="content has-text-centered">
	<p>
	Workshop on Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI
	</p>
	<p>
	Website template borrowed from the <a href="https://github.com/nerfies/nerfies.github.io">nerfies</a> project page.
	</p>
	</div>
	</div>
	</footer>

	</body>
	</html>