Spaces:
Running
Running
File size: 8,500 Bytes
3f839cd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
<!--
@license
Copyright 2020 Google. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="apple-touch-icon" sizes="180x180" href="https://pair.withgoogle.com/images/favicon/apple-touch-icon.png">
<link rel="icon" type="image/png" sizes="32x32" href="https://pair.withgoogle.com/images/favicon/favicon-32x32.png">
<link rel="icon" type="image/png" sizes="16x16" href="https://pair.withgoogle.com/images/favicon/favicon-16x16.png">
<link rel="mask-icon" href="https://pair.withgoogle.com/images/favicon/safari-pinned-tab.svg" color="#00695c">
<link rel="shortcut icon" href="https://pair.withgoogle.com/images/favicon.ico">
<script>
!(function(){
var url = window.location.href
if (url.split('#')[0].split('?')[0].slice(-1) != '/' && !url.includes('.html')) window.location = url + '/'
})()
</script>
<title>Hidden Bias</title>
<meta property="og:title" content="Hidden Bias">
<meta property="og:url" content="https://pair.withgoogle.com/explorables/hidden-bias/">
<meta name="og:description" content="Models trained on real-world data can encode real-world bias. Hiding information about protected classes doesn't always fix things — sometimes it can even hurt.">
<meta property="og:image" content="https://pair.withgoogle.com/explorables/images/hidden-bias.png">
<meta name="twitter:card" content="summary_large_image">
<link rel="stylesheet" type="text/css" href="../style.css">
<link href='https://fonts.googleapis.com/css?family=Roboto+Slab:400,500,700|Roboto:700,500,300' rel='stylesheet' type='text/css'>
<link href="https://fonts.googleapis.com/css?family=Google+Sans:400,500,700" rel="stylesheet">
<meta name="viewport" content="width=device-width">
</head>
<body>
<div class='header'>
<div class='header-left'>
<a href='https://pair.withgoogle.com/'>
<img src='../images/pair-logo.svg' style='width: 100px'></img>
</a>
<a href='../'>Explorables</a>
</div>
</div>
<h1 class='headline'>Hidden Bias</h1>
<div class="post-summary">Models trained on real-world data can encode real-world bias. Hiding information about protected classes doesn't always fix things — sometimes it can even hurt.</div>
<link rel="stylesheet" href="style.css">
<div id='container' class='container-1'>
<div id='graph'></div>
<div id='sections'>
<div>
<h3>Modeling College GPA</h3>
<p>Let's pretend we're college admissions officers trying to predict the GPA students will have in college (in these examples we'll use simulated data).
<p>One simple approach: predict that students will have the same GPA in college as they did in high school.
</div>
<div class='img-slide'>
<p>This is at best a very rough approximation, and it misses a key feature of this data set: students usually have better grades in high school than in college
<p>We're <img src='over.png'><span class='xhighlight blue'>over-predicting</span> college grades more often than we <img src='over.png'><span class='xhighlight orange'>under-predict.</span>
</div>
<div>
<h3>Predicting with ML</h3>
<p>If we switched to using a machine learning model and entered these student grades, it would recognize this pattern and adjust the prediction.
<p>The model does this without knowing anything about the real-life context of grading in high school versus college.
</div>
<div>
<p>Giving the model <span class='highlight blue'>more information</span> about students increases accuracy more...
</div>
<div>
<p>...and more.
</div>
<div>
<h3>Models can encode previous bias</h3>
<p>All of this sensitive information about students is just a long list of numbers to model.
<p>If a sexist college culture has historically led to lower grades for <span class='f circle'> </span> female students, the model will pick up on that correlation and predict lower grades for women.
<p>Training on historical data bakes in historical biases. Here the sexist culture has improved, but the model learned from the past correlation and still predicts higher grades for <span class='m circle'> </span> men.
</div>
<div>
<h3>Hiding protected classes from the model might not stop discrimination</h3>
<p>Even if we don't tell the model students' genders, it might still score <span class='f circle'> </span> female students poorly.
<p>With detailed enough information about every student, the model can still synthesize a proxy for gender out of other <span class='highlight yellow'>variables.</span>
</div>
<div>
<h3>Including a protected attribute may even <i>decrease</i> discrimination</h3>
<p>Let's look at a simplified model, one only taking into account the recommendation of an alumni interviewer.
</div>
<div>
<p>The interviewer is quite accurate, except that they're biased against students with a <span class='l circle'> </span> low household income.
<p>In our toy model, students' grades don't depend on their income once they're in college. In other words, we have biased inputs and unbiased outcomes—the opposite of the previous example, where the inputs weren't biased, but the toxic culture biased the outcomes.
</div>
<div>
<p>If we also tell the model each student's <span class='highlight blue'>household income</span>, it will naturally correct for the interviewer's overrating of <span class='h circle'> </span> high-income students just like it corrected for the difference between high school and college GPAs.
<p>By carefully considering and accounting for bias, we've made the model fairer and more accurate. This isn't always easy to do, especially in circumstances like the historically toxic college culture where unbiased data is limited.
<p>And there are fundamental fairness trade-offs that have to be made. Check out the <a href='../measuring-fairness/'>Measuring Fairness explorable</a> to see how those tradeoffs work.<a href='../measuring-fairness/'><br><img style='width: 100%; max-width: 391px; margin-left: -8px' src='../images/medical-fairness.gif'></a>
<br><br>
<p>Adam Pearce // May 2020
<p>Thanks to Carey Radebaugh, Dan Nanas, David Weinberger, Emily Denton, Emily Reif, Fernanda Viégas, Hal Abelson, James Wexler, Kristen Olson, Lucas Dixon, Mahima Pushkarna, Martin Wattenberg, Michael Terry, Rebecca Salois, Timnit Gebru, Tulsee Doshi, Yannick Assogba, Yoni Halpern, Zan Armstrong, and my other colleagues at Google for their help with this piece.
</div>
</div>
</div>
<div id='end'></div>
<link rel="stylesheet" href="../measuring-fairness/graph-scroll.css">
<script src="../third_party/seedrandom.min.js"></script>
<script src='../third_party/d3_.js'></script>
<script src='../third_party/swoopy-drag.js'></script>
<script src='../third_party/misc.js'></script>
<script src='annotations.js'></script>
<script src='script.js'></script>
</body>
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-138505774-1"></script>
<script>
if (window.location.origin === 'https://pair.withgoogle.com'){
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-138505774-1');
}
</script>
<script>
// Tweaks for displaying in an iframe
if (window !== window.parent){
// Open links in a new tab
Array.from(document.querySelectorAll('a'))
.forEach(e => {
// skip anchor links
if (e.href && e.href[0] == '#') return
e.setAttribute('target', '_blank')
e.setAttribute('rel', 'noopener noreferrer')
})
// Remove recirc h3
Array.from(document.querySelectorAll('h3'))
.forEach(e => {
if (e.textContent != 'More Explorables') return
e.parentNode.removeChild(e)
})
// Remove recirc container
var recircEl = document.querySelector('#recirc')
recircEl.parentNode.removeChild(recircEl)
}
</script>
</html> |