File size: 8,500 Bytes
3f839cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
<!--
@license
Copyright 2020 Google. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

<!DOCTYPE html>

<html>
<head>
	<meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">

  <link rel="apple-touch-icon" sizes="180x180" href="https://pair.withgoogle.com/images/favicon/apple-touch-icon.png">
  <link rel="icon" type="image/png" sizes="32x32" href="https://pair.withgoogle.com/images/favicon/favicon-32x32.png">
  <link rel="icon" type="image/png" sizes="16x16" href="https://pair.withgoogle.com/images/favicon/favicon-16x16.png">
  <link rel="mask-icon" href="https://pair.withgoogle.com/images/favicon/safari-pinned-tab.svg" color="#00695c">
  <link rel="shortcut icon" href="https://pair.withgoogle.com/images/favicon.ico">

  <script>
    !(function(){
      var url = window.location.href
      if (url.split('#')[0].split('?')[0].slice(-1) != '/' && !url.includes('.html')) window.location = url + '/'
    })()
  </script>

  <title>Hidden Bias</title>
  <meta property="og:title" content="Hidden Bias">
  <meta property="og:url" content="https://pair.withgoogle.com/explorables/hidden-bias/">

  <meta name="og:description" content="Models trained on real-world data can encode real-world bias. Hiding information about protected classes doesn't always fix things — sometimes it can even hurt.">
  <meta property="og:image" content="https://pair.withgoogle.com/explorables/images/hidden-bias.png">
  <meta name="twitter:card" content="summary_large_image">
  
	<link rel="stylesheet" type="text/css" href="../style.css">

  <link href='https://fonts.googleapis.com/css?family=Roboto+Slab:400,500,700|Roboto:700,500,300' rel='stylesheet' type='text/css'>  
  <link href="https://fonts.googleapis.com/css?family=Google+Sans:400,500,700" rel="stylesheet">

	<meta name="viewport" content="width=device-width">
</head>
<body>
  <div class='header'>
    <div class='header-left'>
      <a href='https://pair.withgoogle.com/'>
        <img src='../images/pair-logo.svg' style='width: 100px'></img>
      </a>
      <a href='../'>Explorables</a> 
    </div>
  </div>
  
  <h1 class='headline'>Hidden Bias</h1>
  <div class="post-summary">Models trained on real-world data can encode real-world bias. Hiding information about protected classes doesn't always fix things — sometimes it can even hurt.</div>
  <link rel="stylesheet" href="style.css">

<div id='container' class='container-1'>
<div id='graph'></div>
<div id='sections'>


<div>
<h3>Modeling College GPA</h3>

<p>Let's pretend we're college admissions officers trying to predict the GPA students will have in college (in these examples we'll use simulated data).

<p>One simple approach: predict that students will have the same GPA in college as they did in high school. 
</div>


<div class='img-slide'>
<p>This is at best a very rough approximation, and it misses a key feature of this data set: students usually have better grades in high school than in college

<p>We're <img src='over.png'><span class='xhighlight blue'>over-predicting</span> college grades more often than we <img src='over.png'><span class='xhighlight orange'>under-predict.</span>
</div>


<div>
<h3>Predicting with ML</h3>
<p>If we switched to using a machine learning model and entered these student grades, it would recognize this pattern and adjust the prediction.

<p>The model does this without knowing anything about the real-life context of grading in high school versus college.
</div>


<div>
<p>Giving the model <span class='highlight blue'>more information</span> about students increases accuracy more...
</div>


<div>
<p>...and more.
</div>


<div>
<h3>Models can encode previous bias</h3>
<p>All of this sensitive information about students is just a long list of numbers to model. 

<p>If a sexist college culture has historically led to lower grades for <span class='f circle'>&nbsp;</span> female students, the model will pick up on that correlation and predict lower grades for women.  

<p>Training on historical data bakes in historical biases. Here the sexist culture has improved, but the model learned from the past correlation and still predicts higher grades for <span class='m circle'>&nbsp;</span> men.
</div>

<div>
<h3>Hiding protected classes from the model might not stop discrimination</h3>

<p>Even if we don't tell the model students' genders, it might still score <span class='f circle'>&nbsp;</span> female students poorly.

<p>With detailed enough information about every student, the model can still synthesize a proxy for gender out of other <span class='highlight yellow'>variables.</span>
</div>


<div>
<h3>Including a protected attribute may even <i>decrease</i> discrimination</h3>

<p>Let's look at a simplified model, one only taking into account the recommendation of an alumni interviewer. 
</div>


<div>
<p>The interviewer is quite accurate, except that they're biased against students with a <span class='l circle'>&nbsp;</span> low household income. 

<p>In our toy model, students' grades don't depend on their income once they're in college. In other words, we have biased inputs and unbiased outcomes—the opposite of the previous example, where the inputs weren't biased, but the toxic culture biased the outcomes. 
</div>


<div>
<p>If we also tell the model each student's <span class='highlight blue'>household income</span>, it will naturally correct for the interviewer's overrating of <span class='h circle'>&nbsp;</span> high-income students just like it corrected for the difference between high school and college GPAs. 

<p>By carefully considering and accounting for bias, we've made the model fairer and more accurate. This isn't always easy to do, especially in circumstances like the historically toxic college culture where unbiased data is limited. 

<p>And there are fundamental fairness trade-offs that have to be made. Check out the <a href='../measuring-fairness/'>Measuring Fairness explorable</a> to see how those tradeoffs work.<a href='../measuring-fairness/'><br><img style='width: 100%; max-width: 391px; margin-left: -8px' src='../images/medical-fairness.gif'></a>


<br><br>

<p>Adam Pearce // May 2020

<p>Thanks to Carey Radebaugh, Dan Nanas, David Weinberger, Emily Denton, Emily Reif, Fernanda Viégas, Hal Abelson, James Wexler, Kristen Olson, Lucas Dixon, Mahima Pushkarna, Martin Wattenberg, Michael Terry, Rebecca Salois, Timnit Gebru, Tulsee Doshi, Yannick Assogba, Yoni Halpern, Zan Armstrong, and my other colleagues at Google for their help with this piece.
</div>

</div>
</div>
<div id='end'></div>


<link rel="stylesheet" href="../measuring-fairness/graph-scroll.css">

<script src="../third_party/seedrandom.min.js"></script>
<script src='../third_party/d3_.js'></script>
<script src='../third_party/swoopy-drag.js'></script>
<script src='../third_party/misc.js'></script>
<script src='annotations.js'></script>
<script src='script.js'></script>
</body>

<script async src="https://www.googletagmanager.com/gtag/js?id=UA-138505774-1"></script>
<script>
  if (window.location.origin === 'https://pair.withgoogle.com'){
    window.dataLayer = window.dataLayer || [];
    function gtag(){dataLayer.push(arguments);}
    gtag('js', new Date());
    gtag('config', 'UA-138505774-1');
  }
</script>

<script>
  // Tweaks for displaying in an iframe
  if (window !== window.parent){
    
    // Open links in a new tab
    Array.from(document.querySelectorAll('a'))
      .forEach(e => {
        // skip anchor links
        if (e.href && e.href[0] == '#') return

        e.setAttribute('target', '_blank')
        e.setAttribute('rel', 'noopener noreferrer')
      })

    // Remove recirc h3
    Array.from(document.querySelectorAll('h3'))
      .forEach(e => {
        if (e.textContent != 'More Explorables') return

        e.parentNode.removeChild(e)
      })

    // Remove recirc container
    var recircEl = document.querySelector('#recirc')
    recircEl.parentNode.removeChild(recircEl)
  }
</script>

</html>