Dalia Gala commited on
Commit
ddd1328
1 Parent(s): 98eec2c

removing cached git files

Browse files
.Rhistory DELETED
File without changes
images/.ipynb_checkpoints/pie_charts-checkpoint.png DELETED
Binary file (108 kB)
 
images/.ipynb_checkpoints/tests-checkpoint.png DELETED
Binary file (140 kB)
 
pages/.ipynb_checkpoints/1_📊_Define_Target_Variables-checkpoint.py DELETED
@@ -1,266 +0,0 @@
1
- #!/usr/bin/env python3
2
- # -*- coding: utf-8 -*-
3
- """
4
- Created on Thu May 25 15:16:50 2023
5
-
6
- @author: daliagala
7
- """
8
-
9
- ### LIBRARIES ###
10
- import streamlit as st
11
- import pandas as pd
12
- from sklearn.model_selection import train_test_split
13
- from utils import assign_labels_by_probabilities, drop_data, train_and_predict
14
-
15
- ### PAGE CONFIG ###
16
- st.set_page_config(page_title='EquiVar', page_icon=':robot_face:', layout='wide')
17
-
18
- hide_st_style = """
19
- <style>
20
- #GithubIcon {visibility: hidden;}
21
- #MainMenu {visibility: hidden;}
22
- footer {visibility: hidden;}
23
- header {visibility: hidden;}
24
- </style>
25
- """
26
- st.markdown(hide_st_style, unsafe_allow_html=True)
27
-
28
- ### IMPORT DATA FILES ###
29
- dataframe = pd.read_csv('./data/dataframe.csv')
30
- dataframe = dataframe.drop(["Unnamed: 0"], axis = 1)
31
- dataframe = dataframe.rename(columns={"education_level": "education level"})
32
-
33
- ### DICTIONARIES AND CONSTANTS###
34
- groups = ["attention", "reasoning", "memory", "behavioural restraint", "information processing speed"]
35
-
36
- groups_dict = {
37
- 'Divided Visual Attention' : 'attention',
38
- 'Forward Memory Span': 'memory',
39
- 'Arithmetic Reasoning' : 'reasoning',
40
- 'Grammatical Reasoning' : 'reasoning',
41
- 'Go/No go' : 'behavioural restraint',
42
- 'Reverse_Memory_Span' : 'memory',
43
- 'Verbal List Learning' : 'memory',
44
- 'Delayed Verbal List Learning' : 'memory',
45
- 'Digit Symbol Coding' : 'information processing speed',
46
- 'Trail Making Part A' : 'information processing speed',
47
- 'Trail Making Part B' : 'information processing speed'
48
- }
49
-
50
- education_dict = {
51
- 1: 'Some high school',
52
- 2: 'High school diploma / GED',
53
- 3: 'Some college',
54
- 4: 'College degree',
55
- 5: 'Professional degree',
56
- 6: "Master's degree",
57
- 7: 'Ph.D.',
58
- 8: "Associate's degree",
59
- 99: 'Other'
60
- }
61
-
62
- df_keys_dict = {
63
- 'Divided Visual Attention' :'divided_visual_attention',
64
- 'Forward Memory Span' :'forward_memory_span',
65
- 'Arithmetic Reasoning' : 'arithmetic_problem_solving',
66
- 'Grammatical Reasoning' : 'logical_reasoning',
67
- 'Go/No go': 'adaptive_behaviour_response_inhibition',
68
- 'Reverse_Memory_Span' : 'reverse_memory_span',
69
- 'Verbal List Learning': 'episodic_verbal_learning',
70
- 'Delayed Verbal List Learning': 'delayed_recall',
71
- 'Digit Symbol Coding': 'abstract_symbol_processing_speed',
72
- 'Trail Making Part A' :'numerical_info_processing_speed',
73
- 'Trail Making Part B': 'numerical_and_lexical_info_processing_speed'
74
- }
75
-
76
-
77
- ### CREATE THE "TARGET VARIABLE DEFINITION" PAGE ###
78
- st.title('Target variable definition')
79
-
80
- st.markdown('''On this page, we invite you to imagine that you are hiring for a certain role. Using the sliders below, you will specify two different notions of a “good employee” for that role—two different target variables. Once you’re done, the simulator will build two models, one for each of your target variable definitions. You can visualize these datasets and models—and their effects on fairness and overall features of the models and data—in the Visualize the Results page.''')
81
-
82
- st.markdown('''You specify the notions of different employees by assigning weights of importance to cognitive characteristics that a “good” employee would have for the role. The cognitive test data that you’ll be working with comes from real-world people, mirroring an increasing number of hiring algorithms that are based cognitive tests ([Wilson, et al. 2021](https://dl.acm.org/doi/10.1145/3442188.3445928)).''')
83
-
84
- st.markdown('''We have pre-set the weights below to reflect two different conceptions of a “good” employee: one conception that emphasizes attentiveness and numerical skills (A) and another conception that emphasizes interpersonal skills and memory (B). If you want to, you can change the slider values as you see fit.''')
85
-
86
- st.markdown('''After you’ve set the slider values, click :red[“Assign labels and train your models”]. The simulator will then label certain individuals as "good employees"—in other words it will assign class labels "1" for successful and "0" for not good, based on your selections. Then, the simulator will build the two models.''')
87
-
88
- st.markdown('''To learn more about target variable definition and hiring models based in cognitive tests, :green["See explanation"] below.''')
89
-
90
- with st.expander("See explanation"):
91
-
92
- st.markdown('''The models that the simulator builds are of a kind increasingly used in hiring software: companies will have applicants play games that test for different kinds of cognitive ability (like reasoning or memory). Then hiring software will be built to predict which applicants will be successful based on which cognitive characteristics they have. What cognitive characteristics make for a successful employee? This will depend on what role is being hired. And it will also depend on how one defines “successful employee.”''')
93
-
94
- st.markdown('''In the real world, “successful employee” is defined for these kinds of hiring models in the following way. Managers select a group of current employees that they consider to be successful; this group of employees plays the cognitive test games. A model is then trained to identify applicants who share cognitive characteristics with the current employees that are considered successful. The target variable of “successful employee” is thus defined in terms of comparison to certain people who are deemed successful. ''')
95
-
96
- st.markdown('''One will get different target variables if one deems different current employees as the successful ones. And, as we discussed in the Home page (and as we explain more in the Putting the Idea into Practice page), there will likely be disagreement between managers about which employees are successful. For instance, a manager who values attentiveness and numerical skills will deem different employees “successful” than a manager who values interpersonal skills and memory. Even when different managers roughly share their sensibilities in what characteristics make for a successful employee, there may still be different, equally good ways to “weight” the importance of the various characteristics.''')
97
-
98
- st.markdown('''In the real world, the cognitive characteristics shared by those considered successful employees is implicit. Companies do not first identify the cognitive characteristics that make for a successful employee; rather, they identify employees who they consider successful, and then the hiring model works backwards to identify what characteristics these employees share.''')
99
-
100
- st.markdown('''In our simulator, the cognitive characteristics shared by “good” employees are explicit. You assign different weights—using the sliders—to the cognitive characteristics you think are more or less important in a good employee (for the role you’re considering). To illustrate how different target variables have different effects on fairness and overall model attributes, you’ll define “good” employee in two ways. (We’ve made the cognitive characteristics explicit both so you can see the point of different target variable definitions more clearly, and because of limitations of the data that we’re working with.)''')
101
-
102
- st.markdown('''The cognitive characteristics that the simulator works with are from one of datasets of the [NeuroCognitive Performance Test](https://www.nature.com/articles/s41597-022-01872-8). This dataset has eleven different tests which we have grouped into five categories: ''')
103
-
104
- st.markdown(
105
- """
106
- - **Memory**: Forward Memory Span, Reverse Memory Span, Verbal List Learning, Delayed Verbal List Learning
107
- - **Information Processing Speed**: Digit Symbol Coding, Trail Making Part A, Trail Making Part B
108
- - **Reasoning**: Arithmetic Reasoning, Grammatical Reasoning
109
- - **Attention**: Divided Visual Attention
110
- - **Behavioral Restraint**: Go/No go
111
- """)
112
-
113
- st.markdown('''After you’ve set the weights to these five characteristics using the sliders, you can see which weights are assigned to each test (e.g. Forward Memory Span or Digit Symbol Coding) by ticking the checkbox beneath the sliders. ''')
114
-
115
- col1, col2 = st.columns(2)
116
-
117
- #Initialise slider values
118
- list_values_A = (9, 10, 2, 1, 5)
119
- list_values_B = (1, 2, 10, 9, 3)
120
-
121
- selectionsA = {}
122
- selectionsB = {}
123
- results_dict_A = groups_dict
124
- results_dict_B = groups_dict
125
-
126
- with col1:
127
- st.subheader("Define target variable for model A ")
128
-
129
- if "slider_values_A" not in st.session_state:
130
- for count, value in enumerate(groups):
131
- selectionsA[value] = list_values_A[count]
132
- st.session_state["slider_values_A"] = selectionsA
133
- else:
134
- selectionsA = st.session_state["slider_values_A"]
135
-
136
- for i in groups:
137
- nameA = f"{i} importance, model A"
138
- value = selectionsA[i]
139
- slider = st.slider(nameA, min_value=0, max_value=10, value = value)
140
- selectionsA[i] = slider
141
-
142
- results_dict_A = {k: selectionsA.get(v, v) for k, v in results_dict_A.items()}
143
- total = sum(results_dict_A.values())
144
- for (key, u) in results_dict_A.items():
145
- if total != 0:
146
- w = (u/total)
147
- results_dict_A[key] = w
148
-
149
- if st.checkbox("Show target variable A weights per subtest", key="A"):
150
- for (key, u) in results_dict_A.items():
151
- txt = key.replace("_", " ")
152
- st.markdown("- " + txt + " : " + f":green[{str(round((u*100), 2))}]")
153
-
154
-
155
- st.session_state["slider_values_A"] = selectionsA
156
-
157
-
158
- with col2:
159
- st.subheader("Define target variable for model B ")
160
-
161
- if "slider_values_B" not in st.session_state:
162
- for count, value in enumerate(groups):
163
- selectionsB[value] = list_values_B[count]
164
- st.session_state["slider_values_B"] = selectionsB
165
- else:
166
- selectionsB = st.session_state["slider_values_B"]
167
-
168
- for i in groups:
169
- nameB = f"{i} importance, model B"
170
- value = selectionsB[i]
171
- slider = st.slider(nameB, min_value=0, max_value=10, value = value)
172
- selectionsB[i] = slider
173
-
174
- results_dict_B = {k: selectionsB.get(v, v) for k, v in results_dict_B.items()}
175
- total = sum(results_dict_B.values())
176
- for (key, u) in results_dict_B.items():
177
- if total != 0:
178
- w = ((u/total))
179
- results_dict_B[key] = w
180
-
181
- if st.checkbox("Show target variable B weights per subtest", key = "B"):
182
- for (key, u) in results_dict_B.items():
183
- txt = key.replace("_", " ")
184
- st.markdown("- " + txt + " : " + f":green[{str(round((u*100), 2))}]")
185
-
186
- st.session_state["slider_values_B"] = selectionsB
187
-
188
- if st.button("Assign labels and train your models", type = "primary", use_container_width = True):
189
- if 'complete_df' in st.session_state:
190
- del st.session_state['complete_df']
191
- if 'clean_df' in st.session_state:
192
- del st.session_state['clean_df']
193
- if 'cm_A' in st.session_state:
194
- del st.session_state['cm_A']
195
- if 'cm_B' in st.session_state:
196
- del st.session_state['cm_B']
197
- scoreA = pd.DataFrame()
198
- scoreB = pd.DataFrame()
199
- test1 = all(value == 0 for value in results_dict_A.values())
200
- test2 = all(value == 0 for value in results_dict_B.values())
201
- if test1 == True or test2 == True:
202
- st.error('Cannot train the models if you do not define the target variables. Make your selections for both models first!', icon="🚨")
203
- else:
204
- for (key, u) in results_dict_A.items():
205
- scoreA[df_keys_dict[key]] = u * dataframe[df_keys_dict[key]]
206
- scoresA = scoreA.sum(axis=1)
207
- dataframe['model_A_scores'] = scoresA
208
- for (key, u) in results_dict_B.items():
209
- scoreB[df_keys_dict[key]] = u * dataframe[df_keys_dict[key]]
210
- scoresB = scoreB.sum(axis=1)
211
- dataframe['model_B_scores'] = scoresB
212
-
213
- new_annotated = assign_labels_by_probabilities(dataframe, "model_A_scores", "Model_A_label", "Model_A_probabilities", quantile=0.85, num_samples=100)
214
- new_annotated = assign_labels_by_probabilities(new_annotated, "model_B_scores", "Model_B_label", "Model_B_probabilities", quantile=0.85, num_samples=100)
215
- new_annotated = new_annotated.reset_index()
216
-
217
-
218
- clean_data = drop_data(new_annotated)
219
- # specify the columns of interest
220
- selected_cols = ['Model_A_label', 'Model_B_label']
221
-
222
- # count the number of rows where all three selected columns have a value of 1
223
- num_rows_with_all_flags_1 = len(new_annotated[new_annotated[selected_cols].sum(axis=1) == len(selected_cols)])
224
-
225
- # print the result
226
- st.write(f"Shared candidates between your target variables: :green[{num_rows_with_all_flags_1}].")
227
- with st.spinner('Please wait... The models will be trained now.'):
228
-
229
- X_data, Y_data_A, Y_data_B = clean_data.iloc[:, :-2], clean_data.iloc[:, [-2]], clean_data.iloc[:, [-1]]
230
- X_data = X_data.drop(["index"], axis = 1)
231
- Y_data_B = Y_data_B.reset_index()
232
- X_train, X_test, y_train_A, y_test_A = train_test_split(X_data, Y_data_A, test_size=0.2)
233
- y_train_A = y_train_A.reset_index()
234
- y_test_A = y_test_A.reset_index()
235
- y_train_B = pd.merge(y_train_A,Y_data_B[['index', 'Model_B_label']],on='index', how='left')
236
- y_test_B = pd.merge(y_test_A,Y_data_B[['index', 'Model_B_label']],on='index', how='left')
237
- y_train_B = y_train_B.drop(labels='Model_A_label', axis = 1)
238
- y_test_B = y_test_B.drop(labels='Model_A_label', axis = 1)
239
- y_train_A = y_train_A.set_index("index")
240
- y_train_B = y_train_B.set_index("index")
241
- y_test_A = y_test_A.set_index("index")
242
- y_test_B = y_test_B.set_index("index")
243
-
244
- accuracy_A, precision_A, recall_A, X_full_A, cm_A, baseline_accuracy_A = train_and_predict("A", X_train, X_test, y_train_A, y_test_A)
245
- accuracy_B, precision_B, recall_B, X_full_B, cm_B, baseline_accuracy_B = train_and_predict("B", X_train, X_test, y_train_B, y_test_B)
246
- full = pd.merge(X_full_A,X_full_B[['index','Predicted_B', 'Prob_0_B', "Prob_1_B"]],on='index', how='left')
247
- complete = pd.merge(full,new_annotated[['index', 'age', 'gender', 'education level', 'country', 'Model_A_label', 'Model_B_label', 'model_A_scores', 'model_B_scores']],on='index', how='left')
248
- complete=complete.replace({"education level": education_dict})
249
- complete = complete.rename(columns={"index": "Candidate ID"})
250
-
251
- if 'complete_df' not in st.session_state:
252
- st.session_state['complete_df'] = complete
253
- if 'clean_df' not in st.session_state:
254
- st.session_state['clean_df'] = clean_data
255
- if 'cm_A' not in st.session_state:
256
- st.session_state['cm_A'] = cm_A
257
- if 'cm_B' not in st.session_state:
258
- st.session_state['cm_B'] = cm_B
259
-
260
- row1_space1, row1_1, row1_space2, row1_2, row1_space3 = st.columns((0.1, 3, 0.1, 3, 0.1))
261
- with row1_1:
262
- st.write(f"Model A accuracy: :green[{baseline_accuracy_A}].")
263
- with row1_2:
264
- st.write(f"Model B accuracy: :green[{baseline_accuracy_B}].")
265
-
266
- st.success('''Success! You have defined the target variables and trained your models. Head to "Visualise the Results" in the sidebar.''')
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
pages/.ipynb_checkpoints/2_📈_Visualize_the_Results-checkpoint.py DELETED
@@ -1,593 +0,0 @@
1
- #!/usr/bin/env python3
2
- # -*- coding: utf-8 -*-
3
- """
4
- Created on Thu May 25 15:30:56 2023
5
-
6
- @author: daliagala
7
- """
8
-
9
- ### LIBRARIES ###
10
- import streamlit as st
11
- import numpy as np
12
- import pandas as pd
13
- import matplotlib.pyplot as plt
14
- import plotly.express as px
15
- import plotly.graph_objects as go
16
- from matplotlib_venn import venn2
17
- from sklearn.metrics import confusion_matrix
18
- from utils import display_proportional, plot_data, run_PCA, create_confusion_matrix_heatmap, plot_conf_rates
19
-
20
- ### PAGE CONFIG ###
21
- st.set_page_config(page_title='EquiVar', page_icon=':robot_face:', layout='wide')
22
-
23
- hide_st_style = """
24
- <style>
25
- #GithubIcon {visibility: hidden;}
26
- #MainMenu {visibility: hidden;}
27
- footer {visibility: hidden;}
28
- header {visibility: hidden;}
29
- </style>
30
- """
31
- st.markdown(hide_st_style, unsafe_allow_html=True)
32
-
33
- ### DICTIONARIES AND CONSTANTS###
34
-
35
- colours_education = {
36
- 'Some high school' : 'indigo',
37
- 'High school diploma / GED' : '#7ae99e',
38
- 'Some college' : '#0a68c9',
39
- 'College degree': '#80c4fa',
40
- 'Professional degree': '#f48508',
41
- "Master's degree" : '#2fa493',
42
- 'Ph.D.' : '#f2a3a1',
43
- "Associate's degree" : '#fbcc66',
44
- 'Other' : '#fa322f'
45
- }
46
-
47
- colours_country = {
48
- 'AU' : '#7ae99e',
49
- 'US': '#80c4fa',
50
- 'NZ': '#2fa493',
51
- 'CA' : '#fbcc66'
52
- }
53
-
54
- colours_gender = {
55
- 'f' : '#83C9FF',
56
- 'm': '#0067C9'
57
- }
58
-
59
- characteristic_dict = {
60
- 'gender' : colours_gender,
61
- 'education level' : colours_education,
62
- 'country' : colours_country,
63
- 'age' : 'indigo'
64
- }
65
-
66
- pred_dict = {
67
- 'Model A' : 'Predicted_A',
68
- 'Model B' : 'Predicted_B'
69
- }
70
-
71
- prob_dict = {
72
- 'Model A' : 'Prob_1_A',
73
- 'Model B' : 'Prob_1_B'
74
- }
75
-
76
- model_dict = {
77
- 'Model A' : 'Model_A_label',
78
- 'Model B' : 'Model_B_label'
79
- }
80
-
81
- df_keys_dict = {
82
- 'Divided Visual Attention' :'divided_visual_attention',
83
- 'Forward Memory Span' :'forward_memory_span',
84
- 'Arithmetic Reasoning' : 'arithmetic_problem_solving',
85
- 'Grammatical Reasoning' : 'logical_reasoning',
86
- 'Go/No go': 'adaptive_behaviour_response_inhibition',
87
- 'Reverse_Memory_Span' : 'reverse_memory_span',
88
- 'Verbal List Learning': 'episodic_verbal_learning',
89
- 'Delayed Verbal List Learning': 'delayed_recall',
90
- 'Digit Symbol Coding': 'abstract_symbol_processing_speed',
91
- 'Trail Making Part A' :'numerical_info_processing_speed',
92
- 'Trail Making Part B': 'numerical_and_lexical_info_processing_speed'
93
- }
94
-
95
- ### DEFINE ALL SUB-PAGES AS FUNCTIONS TO CALL ###
96
-
97
- def mod_prop(cmA, cmB):
98
- st.markdown('''This section contains model confusion matrices, which give us a lot of information about how good the models which we produced really are. We obtain the confusion matrices by plotting "Actual" labels on one axis, and "Predicted" labels on the other. For both models, in each square of each confusion matrix, we can see which group of candidates it represents, the number of candidates in this group, and what percentage of all candidates assessed they represent.''')
99
- st.markdown('''
100
- - **True Negative (TN)**: candidates predicted to have label "0" whose actual label was "0".
101
- - **False Positive (FP)**: candidates predicted to have label "1" whose actual label was "0".
102
- - **False Negative (FN)** candidates predicted to have label "0" whose actual label was "1".
103
- - **True Positive (TP)**: candidates predicted to have label "1" whose actual label was "1".
104
- ''')
105
-
106
- row1_space1, row1_1, row1_space2, row1_2, row1_space3 = st.columns((0.1, 3, 0.1, 3, 0.1))
107
- with row1_1:
108
- create_confusion_matrix_heatmap(cmB, "Model A")
109
-
110
- with row1_2:
111
- model = "B"
112
- create_confusion_matrix_heatmap(cmB, "Model B")
113
-
114
- st.subheader("Model Accuracy Rates")
115
- st.markdown('''We can also represent each model in terms of its accuracy rates, calculated using the numbers of candidates in each group shown in the confusion matrix:''')
116
- st.markdown('''
117
- - True Positive Rate (TPR), also called recall, sensitivity or hit rate, is the probability of a positive test result when the result is indeed positive.
118
- - True Negative Rate (TNR), or specificity/selectivity, is the probability of a negative test result when the test is indeed negative.
119
- - Positive Predictive Value (PPV) or Precision is the ratio of truly positive results to all positive results.
120
- - Negative Predictive Value (NPR) is the ratio of truly negative results to all negative results.
121
- - False Positive Rate (FPR) or fall-out is the probability of assigning a falsely positive result when the test is negative.
122
- - False Negative Rate (FNR) or miss rate is the probability that a test with a positive result will falsely be assigned a negative result.
123
- - False Discovery Rate (FDR) it the ratio of false positive results to total number of positive results.
124
- ''')
125
- measures_A = plot_conf_rates(cmA)
126
- measures_B = plot_conf_rates(cmB)
127
- fig = go.Figure()
128
- fig.add_trace(go.Bar(
129
- x=measures_A["Measure"],
130
- y=measures_A["Score"],
131
- name='Model A rates',
132
- marker_color='rgb(55, 83, 109)'
133
- ))
134
- fig.add_trace(go.Bar(
135
- x=measures_B["Measure"],
136
- y=measures_B["Score"],
137
- name='Model B rates',
138
- marker_color='rgb(26, 118, 255)'
139
- ))
140
- fig.update_layout(
141
- title='Model measures comparison',
142
- xaxis_tickfont_size=14,
143
- xaxis=dict(
144
- title='Measure'),
145
- yaxis=dict(
146
- title='Score',
147
- titlefont_size=16,
148
- tickfont_size=14,
149
- ),
150
- legend=dict(
151
- bgcolor='rgba(255, 255, 255, 0)',
152
- bordercolor='rgba(255, 255, 255, 0)'
153
- ),
154
- barmode='group',
155
- bargap=0.15, # gap between bars of adjacent location coordinates.
156
- bargroupgap=0.1 # gap between bars of the same location coordinate.
157
- )
158
- st.plotly_chart(fig, use_container_width = True)
159
- if st.checkbox("Show tables"):
160
- row1_space1, row1_1, row1_space2, row1_2, row1_space3 = st.columns((0.1, 3, 0.1, 3, 0.1))
161
- # CSS to inject contained in a string
162
- hide_table_row_index = """
163
- <style>
164
- thead tr th:first-child {display:none}
165
- tbody th {display:none}
166
- </style> """
167
-
168
- # Inject CSS with Markdown
169
- st.markdown(hide_table_row_index, unsafe_allow_html=True)
170
- with row1_1:
171
- st.markdown("Accuracy rates for model A")
172
- st.table(measures_A)
173
- with row1_2:
174
- st.markdown("Accuracy rates for model B")
175
- st.table(measures_B)
176
-
177
- def model_scores(dataframe):
178
- st.markdown('''This section displays the distribution of scores assigned to each hypothetical employee, according to the values you set on the sliders, compared with the distribution of protected characteristics. These scores are then used to assign labels to the hypothetical employees, creating two distinct datasets to train the models. In essence, these scores explicitly and numerically mimic the viewpoints of hypothetical hiring managers A and B when deciding who to label as "top employees."''')
179
- st.markdown('''Just as in the original NCPT dataset analysis, the scores obtained by participants in the cognitive games decline with age. This observation provides insight into the potential issue of ageism inherent in the use of gamified hiring processes.''')
180
- # Create a selectbox to choose a protected characteristic to explore
181
- plot_radio = st.selectbox('Characteristic to explore', characteristic_dict.keys())
182
- row2_space1, row2_1, row2_space2 = st.columns((0.1, 5, 0.1))
183
-
184
- with row2_1:
185
- data = dataframe[["model_A_scores", "model_B_scores", plot_radio]]
186
-
187
- if plot_radio == "age":
188
- bins= [18,20,30,40,50,60,70,80,90]
189
- labels = ['18-20','21-30','31-40','41-50','51-60','61-70','71-80','81-90']
190
- data['age_bins'] = pd.cut(data['age'], bins=bins, labels=labels, right=False)
191
- plot_radio = 'age_bins'
192
- colours = ['rgba(93, 164, 214, 0.5)', 'rgba(255, 144, 14, 0.5)']
193
- c1, c2, c3, c4, c5 = st.columns((0.1, 3, 0.1, 3, 0.1))
194
- with c2:
195
- fig = px.box(data, x = plot_radio, y="model_A_scores", labels={"model_A_scores":"Dataset A Input Scores", 'age_bins':"Age"}, title="Input scores in dataset A")
196
- fig.update_layout(showlegend=False)
197
- fig.update_traces(marker_color='rgba(93, 164, 214, 0.5)')
198
- st.plotly_chart(fig, use_container_width=True)
199
- with c4:
200
- fig = px.box(data, x = plot_radio, y="model_B_scores", labels={"model_B_scores":"Dataset B Input Scores", 'age_bins':"Age"}, title="Input scores in dataset B")
201
- fig.update_traces(marker_color = 'rgba(255, 144, 14, 0.5)')
202
- fig.update_layout(showlegend=False)
203
- st.plotly_chart(fig, use_container_width=True)
204
-
205
- def PCA_general(full_df, dataframe_PCA):
206
- st.markdown('''On this page, you can see the distribution of the dataset labels which were assigned based on the scores calculated from the slider values you selected previously. Principal Components Analysis, or PCA, is a technique often used to analyse and subsequently visualize datasets where there are many features per single example. This is the case with the NCPT dataset used in our simulator. Specifically, the battery which we used has 11 features per single example, the example being the player of cognitive games, and, in our metaphor, a hypothetical employee or job candidate. It is impossible to plot 11 dimensions, and PCA allows for the visualisation of multidimensional data, while also preserving as much information as possible.''')
207
- choice = st.radio("What would you like to explore?", ("PCAs", "Components loading"), horizontal = True)
208
- pcaA, dfA, labelsA, coeffA, componentsA = run_PCA(dataframe_PCA, 'Model_B_label', 'Model_A_label', 2)
209
- pcaB, dfB, labelsB, coeffB, componentsB = run_PCA(dataframe_PCA, 'Model_A_label', 'Model_B_label', 2)
210
- loadings = pcaB.components_.T * np.sqrt(pcaB.explained_variance_)
211
- total_var = pcaA.explained_variance_ratio_.sum() * 100
212
- dfA = dfA.rename(columns={'target': 'Dataset A'}).reset_index()
213
- dfB = dfB.rename(columns={'target': 'Dataset B'}).reset_index()
214
- df_all = pd.merge(dfA, dfB[['index', 'Dataset B']], on='index', how='left')
215
-
216
- conditions = [
217
- (df_all['Dataset A'] == 1) & (df_all['Dataset B'] == 0),
218
- (df_all['Dataset B'] == 1) & (df_all['Dataset A'] == 0),
219
- (df_all['Dataset A'] == 1) & (df_all['Dataset B'] == 1),
220
- (df_all['Dataset A'] == 0) & (df_all['Dataset B'] == 0)]
221
-
222
- values = ['Selected A', 'Selected B', 'Selected both', 'Not selected']
223
- df_all['All'] = np.select(conditions, values)
224
-
225
- df_all = df_all.drop(["index"], axis = 1)
226
- df_all.All=pd.Categorical(df_all.All,categories=['Not selected', 'Selected A', 'Selected B', 'Selected both'])
227
- df_all=df_all.sort_values('All')
228
-
229
- selections_dict = {0: 'Not selected', 1: 'Selected'}
230
- df_all = df_all.replace({"Dataset A": selections_dict, "Dataset B": selections_dict})
231
-
232
- color_dict_sel = {'Not selected': '#3366CC', 'Selected': 'grey'}
233
-
234
- if "pca_df" not in st.session_state:
235
- st.session_state.pca_df = df_all
236
-
237
- if choice == "PCAs":
238
- c1, c2 = st.columns(2)
239
- with c1:
240
- fig = px.scatter(st.session_state.pca_df,
241
- x=st.session_state.pca_df['principal component 1'].astype(str),
242
- y=st.session_state.pca_df['principal component 2'].astype(str),
243
- title='Dataset A PCA',
244
- labels={"x": 'PC 1', "y": 'PC 2'},
245
- color=st.session_state.pca_df['Dataset A'],
246
- color_discrete_map=color_dict_sel)
247
- fig.update_traces(marker_size = 8)
248
- st.plotly_chart(fig, use_container_width=True)
249
- with c2:
250
- fig = px.scatter(st.session_state.pca_df,
251
- x=st.session_state.pca_df['principal component 1'].astype(str),
252
- y=st.session_state.pca_df['principal component 2'].astype(str),
253
- title='Dataset B PCA',
254
- labels={"x": 'PC 1', "y": 'PC 2'},
255
- color=st.session_state.pca_df['Dataset B'],
256
- color_discrete_map=color_dict_sel)
257
- fig.update_traces(marker_size = 8)
258
- st.plotly_chart(fig, use_container_width=True)
259
-
260
- st.markdown(f'''These plots show the reduction of 11 dimensions (11 subtest results) to 2 dimensions. Total Variance for the data is {total_var:.2f}%. Both of the datasets have the same features, therefore they both have the same total variance. Total variance value indicates what percentage of information has been preserved when the dimensionality was reduced. Note that for both datasets, A and B, different points are labelled "1" or "0". This shows that the two datasets represent the two different target variable definitions which were created by you previously. The plots are interactive - zoom in to explore in detail.''')
261
-
262
- pcaA, dfA, labelsA, coeffA, componentsA = run_PCA(dataframe_PCA, 'Model_B_label', 'Model_A_label', 2)
263
- pcaB, dfB, labelsB, coeffB, componentsB = run_PCA(dataframe_PCA, 'Model_A_label', 'Model_B_label', 2)
264
- loadings = pcaB.components_.T * np.sqrt(pcaB.explained_variance_)
265
- total_var = pcaA.explained_variance_ratio_.sum() * 100
266
- dfA = dfA.rename(columns={'target': 'Dataset A'}).reset_index()
267
- dfB = dfB.rename(columns={'target': 'Dataset B'}).reset_index()
268
- df_all = pd.merge(dfA, dfB[['index', 'Dataset B']], on='index', how='left')
269
-
270
- conditions = [
271
- (df_all['Dataset A'] == 1) & (df_all['Dataset B'] == 0),
272
- (df_all['Dataset B'] == 1) & (df_all['Dataset A'] == 0),
273
- (df_all['Dataset A'] == 1) & (df_all['Dataset B'] == 1),
274
- (df_all['Dataset A'] == 0) & (df_all['Dataset B'] == 0)]
275
-
276
- values = ['Selected A', 'Selected B', 'Selected both', 'Not selected']
277
- df_all['All'] = np.select(conditions, values)
278
-
279
- df_all = df_all.drop(["index"], axis = 1)
280
- df_all.All=pd.Categorical(df_all.All,categories=['Not selected', 'Selected A', 'Selected B', 'Selected both'])
281
- df_all=df_all.sort_values('All')
282
-
283
- selections_dict = {0: 'Not selected', 1: 'Selected'}
284
- df_all = df_all.replace({"Dataset A": selections_dict, "Dataset B": selections_dict})
285
-
286
- if "pca_df" not in st.session_state:
287
- st.session_state.pca_df = df_all
288
-
289
- fig = px.scatter(st.session_state.pca_df,
290
- x=st.session_state.pca_df['principal component 1'],
291
- y=st.session_state.pca_df['principal component 2'],
292
- title="PCA with labelled groups",
293
- color=st.session_state.pca_df["All"],
294
- width = 800, height = 800,
295
- color_discrete_sequence=px.colors.qualitative.Safe,
296
- opacity = 0.95)
297
-
298
- fig.update_yaxes(
299
- scaleanchor="x",
300
- scaleratio=1,
301
- )
302
- fig.update_traces(marker_size = 10)
303
- st.plotly_chart(fig)
304
-
305
- if choice == "Components loading":
306
- c1,c2 = st.columns(2)
307
- loadings_df = pd.DataFrame(loadings, columns = ["PC1", "PC2"])
308
- labels_A_proper = { v:k for k,v in df_keys_dict.items()}
309
- loadings_df["Features"] = labels_A_proper.values()
310
- with c1:
311
- fig = px.bar(loadings_df, x="PC1", y="Features", orientation = 'h')
312
- st.plotly_chart(fig, use_container_width = True)
313
- with c2:
314
- fig = px.bar(loadings_df, x="PC2", y="Features", orientation = 'h')
315
- st.plotly_chart(fig, use_container_width = True)
316
-
317
- # fig = go.Figure()
318
- # fig.add_trace(go.Bar(
319
- # x=loadings_df["PC1"],
320
- # y=loadings_df["Features"],
321
- # name='Principal Component 1',
322
- # marker_color='rgb(55, 83, 109)',
323
- # orientation='h'
324
- # ))
325
- # fig.add_trace(go.Bar(
326
- # x=loadings_df["PC2"],
327
- # y=loadings_df["Features"],
328
- # name='Principal Component 2',
329
- # marker_color='rgb(26, 118, 255)',
330
- # orientation='h'
331
- # ))
332
- # fig.update_layout(
333
- # title='Component loadings',
334
- # xaxis_tickfont_size=14,
335
- # xaxis=dict(
336
- # title='Loading value'),
337
- # yaxis=dict(
338
- # title='Feature',
339
- # titlefont_size=16,
340
- # tickfont_size=14,
341
- # ),
342
- # legend=dict(
343
- # bgcolor='rgba(255, 255, 255, 0)',
344
- # bordercolor='rgba(255, 255, 255, 0)'
345
- # ),
346
- # barmode='group',
347
- # bargap=0.15, # gap between bars of adjacent location coordinates.
348
- # bargroupgap=0.1 # gap between bars of the same location coordinate.
349
- # )
350
- # st.plotly_chart(fig, use_container_width = True)
351
-
352
- st.markdown('''On this plot, PCA component loadings can be explored. These facilitate the understanding of how much each variable (which there are 11 of) contributes to a particular principal component. Here, the 11 variables were reduced to 2 components, which are labelled PC1 and PC2. The magnitude of the loading (here displayed as the size of the bar in the bar chart) indicates how strong the relationship between the variable and the component is. Therefore, the higher the bar, the stronger the relationship between that component and that variable. The loading's sign can be positive or negative. This indicates whether the principal component and that variable are positively or negatively correlated. We can see that multiple variables are positively correlated with PC2. Two variables, episodic verbal learning and delayed recall are negatively correlated with both of the components.''')
353
-
354
-
355
- def model_out(full_df):
356
- st.markdown('''This section highlights the discrepancies between your two models when presented with the same pool of new, previously unseen candidates to label. Specifically, you'll be investigating the candidates assigned a "1" label by both models. These individuals would be those considered for a job interview or chosen for the role, according to your defined target variable.''')
357
- # Create a selectbox to choose a protected characteristic to explore
358
- selectbox = st.selectbox('Characteristic to explore', characteristic_dict.keys())
359
- representation = st.selectbox("Representation", ("absolute", "proportional"))
360
- row1_space1, row1_1, row1_space2, row1_2, row1_space3 = st.columns((0.1, 3, 0.1, 3, 0.1))
361
- with row1_1:
362
- st.subheader("Candidates selected by model A")
363
-
364
- if representation == "absolute":
365
- # Select predicted data ==1
366
- data = full_df.loc[full_df['Predicted_A'] == 1]
367
-
368
- # Use function plot_data to plot selected data
369
- plot_data(data, selectbox, characteristic_dict[selectbox])
370
- else:
371
- display_proportional(full_df, selectbox, 'Predicted_A')
372
-
373
- with row1_2:
374
- st.subheader("Candidates selected by model B")
375
-
376
- if representation == "absolute":
377
- # Select predicted data ==1
378
- data = full_df.loc[full_df['Predicted_B'] == 1]
379
-
380
- # Use function plot_data to plot selected data
381
- plot_data(data, selectbox, characteristic_dict[selectbox])
382
-
383
- else:
384
- display_proportional(full_df, selectbox,'Predicted_B')
385
-
386
-
387
- st.markdown('''In this section, you're comparing the model's selections concerning four protected characteristics: age, gender, education level, and country. You can visualize these differences in two ways: "Absolute" or "Proportional".''')
388
- st.markdown('''"Absolute" representation gives you the raw numbers or percentages of each characteristic chosen. For instance, if the model labeled 5 female candidates and 5 male candidates as "1", the "Absolute" outcome will display as 50% for both genders."Proportional" representation, on the other hand, shows the percentage of a group selected by the model relative to the total number of that group in the input data. For example, if the model evaluated 100 male candidates and selected 5, you will see a 5% representation. If it evaluated 200 female candidates and selected 5, it will show a 2.5% representation.''')
389
- st.markdown('''If you encounter empty categories in the "Proportional" view, this indicates that while candidates from these categories were evaluated, none were labeled as "1". Hence, their proportional representation amounts to 0%.''')
390
-
391
- def dataframe_out(full_df):
392
- selectbox_M = st.selectbox('Choose which model output to rank by', pred_dict.keys())
393
-
394
- # Select data
395
- data = full_df.loc[full_df[pred_dict[selectbox_M]] == 1]
396
- data = data.sort_values(by = prob_dict[selectbox_M], ascending = False)
397
- data = data[['Candidate ID','Prob_1_A', 'Prob_1_B', 'Predicted_A', 'Predicted_B']]
398
- data = data.rename(columns={"Prob_1_A": "Ranking, model A", "Prob_1_B": "Ranking, model B", "Predicted_A": "Predicted label A", "Predicted_B": "Predicted label B"})
399
- data.index = np.arange(1, len(data) + 1)
400
-
401
- st.table(data.style.background_gradient(subset = ["Ranking, model A", "Ranking, model B"], axis=0, vmin=0.40).highlight_max(color = '#FFCD9B', subset = ["Predicted label A", "Predicted label B"], axis=0))
402
-
403
- st.markdown("""In this section, you can review the data for all candidates labeled "1" by the selected model, found at the top of the page. Simultaneously, you can observe the labels assigned to these same candidates by the other model. It's likely that there will be instances where candidates chosen by one model weren't selected by the other. Candidates labeled "1" are highlighted in orange in the "Predicted label A" and "Predicted label B" columns.""")
404
- st.markdown('''In addition to this, you can see the probability with which each candidate was labeled "1". The intensity of the blue color indicates the candidate's ranking position - a darker blue represents a higher ranking (with 1 being the maximum and 0 the minimum). You may notice that some candidates highly ranked by one model may be ranked significantly lower by the other model.''')
405
-
406
- def venn_diagram(full_df):
407
- row2_space1, row2_1, row2_space2, row2_2, row2_space3 = st.columns((0.1, 1, 0.1, 1, 0.1))
408
- with row2_1:
409
- fig, ax = plt.subplots()
410
-
411
- list_A = full_df.loc[full_df['Predicted_A'] == 1, 'Candidate ID'].astype(int)
412
- list_B = full_df.loc[full_df['Predicted_B'] == 1, 'Candidate ID'].astype(int)
413
- set1 = set(list_A)
414
- set2 = set(list_B)
415
-
416
- venn2([set1, set2], ('Model A', 'Model B'), ax=ax)
417
- st.pyplot(fig)
418
-
419
- with row2_2:
420
- st.markdown('''This Venn Diagram visualizes the number of candidates chosen by both models. It's likely that some candidates will be selected by both models, while others may be chosen by only one model. If we consider Model A as the decision of one hiring manager and Model B as another's, it's easy to see how the selection outcome varies depending on the decision-maker. Some candidates may get the opportunity to be hired, while others might not. This serves as an illustration of the inherent arbitrariness in defining the target variable when dealing with highly subjective outcomes.''')
421
- st.markdown('''For instance, it's straightforward to define a target variable in a classification problem like distinguishing dragonflies from butterflies, where there's little room for ambiguity. However, defining what makes a 'good' employee is far more challenging due to its subjective nature.''')
422
-
423
-
424
-
425
- def model_vis(full_df):
426
- st.markdown('''In this section, you can visualize the demographics of the different subgroups of the data. Firstly, you can see the demographic characteristics of the candidates who have positive labels ("1") and negative labels ("0") which were assigned based on the scores calculated from the slider values you selected previously. Then, you can visualize the demographic distributions of the data which was used for training and evaluation of the models.''')
427
- choice = st.radio("**Select desired data:**", ("Positive and negative labels", "Training and evaluation data"), horizontal=True)
428
- if choice == "Positive and negative labels":
429
- # Create a selectbox to choose a protected characteristic to explore
430
- selectbox_Lab = st.selectbox('Label to visualize', ('positive labels', 'negative labels'))
431
-
432
- # Create a selectbox to choose a protected characteristic to explore
433
- selectbox_Char = st.selectbox('Protected characteristic', characteristic_dict.keys())
434
-
435
- row2_space1, row2_1, row2_space2, row2_2, row2_space3 = st.columns((0.1, 3, 0.1, 3, 0.1))
436
-
437
- with row2_1:
438
- st.subheader("Dataset A")
439
-
440
- # Select test data
441
- if selectbox_Lab == 'positive labels':
442
- data = full_df.loc[full_df['Model_A_label'] == 1]
443
- else:
444
- data = full_df.loc[full_df['Model_A_label'] == 0]
445
-
446
- # Use function plot_data to plot selected data
447
- plot_data(data, selectbox_Char, characteristic_dict[selectbox_Char])
448
-
449
-
450
- with row2_2:
451
- st.subheader("Dataset B")
452
-
453
- # Select test data
454
- if selectbox_Lab == 'positive labels':
455
- data = full_df.loc[full_df['Model_B_label'] == 1]
456
- else:
457
- data = full_df.loc[full_df['Model_B_label'] == 0]
458
-
459
- # Use function plot_data to plot selected data
460
- plot_data(data, selectbox_Char, characteristic_dict[selectbox_Char])
461
- st.markdown('''You are visualising the demographic composition of those hypothetical employees who were assigned labels "1" or "0" based on your definitions of the target variables. You might see differences in proportions of genders between the two models for the positive labels, as well as a major difference in the age between the positive and negative labels. Visualising the labels in this manner before training the model can help understand and mitigate differences in demographic representation in the modelling outcomes. Likely, if all candidates labelled "1" were in younger age groups, the candidates selected by the model at the deployment stage will also be in younger age groups. Moreover, target variable definition affects the proportional representation. Having defined two target variables, one can choose the dataset and the model which offers more proportional representation.''')
462
-
463
-
464
- if choice == "Training and evaluation data":
465
- # Create a selectbox to choose a protected characteristic to explore
466
- selectbox = st.selectbox('Characteristic to explore', characteristic_dict.keys())
467
- row1_space1, row1_1, row1_space2, row1_2, row1_space3 = st.columns((0.1, 1, 0.1, 1, 0.1))
468
- # Plot training data
469
- with row1_1:
470
- st.subheader("Training data")
471
-
472
- # Select train data
473
- train = full_df.loc[full_df["Predicted_A"] == "train"]
474
-
475
- # Use function plot_data to plot selected data
476
- plot_data(train, selectbox, characteristic_dict[selectbox])
477
-
478
- # Plot test data
479
-
480
- with row1_2:
481
- st.subheader("Test data")
482
-
483
- # Select test data
484
- test = full_df.loc[full_df["Predicted_A"] != "train"]
485
-
486
- # Use function plot_data to plot selected data
487
- plot_data(test, selectbox, characteristic_dict[selectbox])
488
-
489
- st.markdown('''To train a machine learning model, the data has to be split into two different sets. The first set is the training data, which will be used to teach the model the relationships between the input features (11 subtest results) and the corresponding labels ("0" and "1", assigned based on your definitions of target variables and the values you chose for the sliders). The second set is the test data, or evaluation data. It is used to assess the performance of the model. This is the data which is used to plot the confusion matrices and calculate the model metrics which you saw at the bottom of the "Define the target variable" page. This is also the data whose features you can explore in "Modelling outcomes". It is important that the training and testing data are balanced. Here, you can compare the demographic composition of the training and evaluation data. The training and evaluation datasets compositions were the same and contained the same candidates and same features for both models A and B. However, the labels for each dataset were different and based on what you selected in "Define target variable".''')
490
-
491
- def filter_for_protected(data):
492
- st.markdown('''Sometimes, the overall model metrics can be deceptive when it comes to predicting the results for different groups under consideration. Ideally, for our models, the varying model metrics would be similar across different groups, which would indicate that the overall model performance is reflected in how this model performs for a given group. It is often not the case, and it is likely that you will see that models A and B perform differently when it comes to those metrics. Even the same model can have different metrics for different subgroups.''')
493
- model = st.selectbox('Choose which model outputs to assess', pred_dict.keys())
494
- test = data.loc[data[pred_dict[model]] != "train"]
495
-
496
- selectbox_Char = st.selectbox('Protected characteristic', characteristic_dict.keys())
497
- if selectbox_Char == 'age':
498
- bins= [18,20,30,40,50,60,70,80,91]
499
- labels = ['18-20','21-30','31-40','41-50','51-60','61-70','71-80','81-90']
500
- test['age_bins'] = pd.cut(test['age'], bins=bins, labels=labels, right=False)
501
- selectbox_Char = 'age_bins'
502
- # which_group = st.selectbox('Which group?', test[selectbox_Char].unique())
503
- df = pd.DataFrame({'Measure': ['True Positive Rate', 'True Negative Rate', 'Positive Predictive Value', 'Negative Predictive Value', 'False Positive Rate', 'False Negative Rate', 'False Discovery Rate']})
504
- for group in test[selectbox_Char].unique():
505
- rslt_df = test[test[selectbox_Char] == group]
506
- y_true = [int(numeric_string) for numeric_string in rslt_df[model_dict[model]]]
507
- y_pred = [int(numeric_string) for numeric_string in rslt_df[pred_dict[model]]]
508
- cm = confusion_matrix(y_true, y_pred)
509
- if cm.shape == (1,1):
510
- cm = np.array([[cm[0, 0], 0], [0, 0]])
511
- d = plot_conf_rates(cm)
512
- df[f"{group}"] = d["Score"]
513
-
514
- fig = go.Figure()
515
- for group in test[selectbox_Char].unique():
516
- fig.add_trace(go.Bar(
517
- x=df["Measure"],
518
- y=df[group],
519
- name=group
520
- ))
521
-
522
- fig.update_layout(
523
- title='Model metrics per group',
524
- xaxis_tickfont_size=14,
525
- xaxis=dict(
526
- title='Metric'),
527
- yaxis=dict(
528
- title='Score',
529
- titlefont_size=16,
530
- tickfont_size=14,
531
- ),
532
- legend=dict(
533
- bgcolor='rgba(255, 255, 255, 0)',
534
- bordercolor='rgba(255, 255, 255, 0)'
535
- ),
536
- barmode='group',
537
- bargap=0.15, # gap between bars of adjacent location coordinates.
538
- bargroupgap=0.1 # gap between bars of the same location coordinate.
539
- )
540
- st.plotly_chart(fig, use_container_width = True)
541
- if st.checkbox("Show table of scores"):
542
- # CSS to inject contained in a string
543
- hide_table_row_index = """
544
- <style>
545
- thead tr th:first-child {display:none}
546
- tbody th {display:none}
547
- </style> """
548
-
549
- # Inject CSS with Markdown
550
- st.markdown(hide_table_row_index, unsafe_allow_html=True)
551
- st.markdown(f"Accuracy rates for {selectbox_Char}")
552
- st.table(df)
553
-
554
- def data_plot(key1, key2, key3, key4):
555
- st.title('''Visualize the Results''')
556
- if key1 not in st.session_state:
557
- st.error('Cannot train the models if you do not define the target variables. Go to "Define Target Variables"!', icon="🚨")
558
- else:
559
- tab1, tab2 = st.tabs(["Demographic", "Non-demographic"])
560
- with tab1:
561
- dataframe = st.session_state[key1]
562
- clean_data = st.session_state[key2]
563
- st.subheader('''**Select what to explore:**''')
564
- data_choice = st.radio('''What to explore''', ("Modelling outcomes", "Input data"), horizontal = True, label_visibility = "collapsed")
565
- if data_choice == "Modelling outcomes":
566
- st.subheader('''Demographics of the overall modelling outcomes''')
567
- model_out(dataframe)
568
- st.subheader('''Demographics of the selected protected groups''')
569
- filter_for_protected(dataframe)
570
- else:
571
- st.subheader('''Demographics of the input scores''')
572
- model_scores(dataframe)
573
- st.subheader('''Demographics of the input labels''')
574
- model_vis(dataframe)
575
- with tab2:
576
- dataframe = st.session_state[key1]
577
- clean_data = st.session_state[key2]
578
- cmA = st.session_state[key3]
579
- cmB = st.session_state[key4]
580
- st.subheader('''**Select what to explore:**''')
581
- data_choice = st.radio('''Select what to explore:''', ("Modelling outcomes", "Input data"), horizontal = True, label_visibility = "collapsed")
582
- if data_choice == "Modelling outcomes":
583
- st.subheader('''Labelled dataframe''')
584
- dataframe_out(dataframe)
585
- st.subheader('''Venn Diagram''')
586
- venn_diagram(dataframe)
587
- st.subheader('''Model accuracy metrics''')
588
- mod_prop(cmA, cmB)
589
- else:
590
- st.subheader('''Principal Component Analysis''')
591
- PCA_general(dataframe, clean_data)
592
-
593
- data_plot('complete_df', 'clean_df', 'cm_A', 'cm_B')
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
pages/.ipynb_checkpoints/3_💡_Put_the_Idea_into_Practice-checkpoint.py DELETED
@@ -1,47 +0,0 @@
1
- #!/usr/bin/env python3
2
- # -*- coding: utf-8 -*-
3
- """
4
- Created on Thu May 25 15:42:38 2023
5
-
6
- @author: daliagala
7
- """
8
-
9
- ### IMPORT LIBRARIES ###
10
- import streamlit as st
11
-
12
- ### PAGE CONFIG ###
13
- st.set_page_config(page_title='EquiVar', page_icon=':robot_face:', layout='wide')
14
-
15
- ### IDEA IN PRACTICE PAGE ###
16
-
17
- st.title("Different target variable definitions in practice")
18
- st.markdown('''This dashboard is designed to help you understand how the notion of a “good employee” is translated into machine learning models—by defining target variables—and how target variable definition affects fairness in hiring algorithms (and other aspects of such algorithms too). On this page, we describe how to put the dashboard, and the insights it affords, into practice.''')
19
-
20
- st.markdown('''How can this be done? The first step is to make two changes to the dashboard, because you cannot simply take it “off the shelf” and immediately put it into practice. This is because:''')
21
-
22
- st.markdown('''- The dashboard is not built using your data or your models.''')
23
-
24
- st.markdown('''- How you define a target variable in the dashboard is not how it’s done in practice. (Rather, the way you define it in the dashboard is a straightforward way for you to see the effects of target variable definition.)''')
25
- st.markdown('''Below we describe how to address these two issues.''')
26
-
27
- st.subheader('''Using your data and your models''')
28
-
29
- st.markdown('''The dashboard offers a starting point: if you want to build something like the dashboard for your data and your models, you now have a blue-print to work from.''')
30
-
31
- st.subheader('''Defining the target variable in practice''')
32
-
33
- st.markdown('''In the dashboard, you define the target variable by assigning weights to different cognitive characteristics. These weights determine the “positive label:” people in our dataset who perform best on the tests, given the weights you assign, are those assigned the positive label—that is, those that the model treats as “good.” Then, the model is trained to identify people whose cognitive characteristics match those with the positive label.''')
34
-
35
- st.markdown('''As we discussed on the Home page, a growing number of hiring algorithms use cognitive tests to identify promising job applicants. However, the way target variable definition works with these real-world algorithms is different from how it works in the dashboard. For example, consider Pymetrics, a leading developer of hiring software. In some cases, Pymetrics builds bespoke algorithms for a company that is hiring for a given role. Pymetrics will ask the company to identify a group of the client’s current employees in that role that the client considers “good.” Then, these “good” employees play cognitive test games similar to the ones used in our dashboard. It is these employees who are assigned the positive labels. From this point on, Pymetrics’ algorithmic development goes just as it does in our dashboard: a model is trained to identify job applicants whose cognitive characteristics are similar to those with the positive label.''')
36
-
37
- st.markdown('''So, for hiring algorithms like Pymetrics’, the target variable is defined not by assigning weights to cognitive attributes, but rather by directly identifying a certain group of current employees as “good.” In the dashboard, you can define different target variables by assigning different weights to the cognitive attributes. If you are in practice building an algorithm like Pymetrics’, you can define different target variables by identifying different groups of current employees as “good.”''')
38
-
39
- st.markdown('''How might this work? As we discussed on the Home page of the dashboard, reasonable minds may disagree about what makes for a good employee, and relatedly reasonable minds may disagree about which current employees are good employees. For example, within a company, two different managers—call them Manager A and Manager B—may not be perfectly aligned in who they consider to be good employees for a certain role. The managers may agree in some cases. We might imagine that there are 50 employees whom both Manager A and Manager B deem good. But the two managers might disagree about other employees. Imagine that there are 25 further employees whom Manager A thinks of as good but Manager B does not (this needn’t mean that Manager B thinks that these employees are bad, just that they are not the best). Likewise, there might be 25 further employees whom Manager B thinks of as good but Manager A does not.''')
40
-
41
- st.markdown('''In this case, there are two different (overlapping) groups of 75 employees, each corresponding to what Managers A and B think of as good employees. These two different groups of employees—and in turn, two different target variable definitions—could be used to train two different models.''')
42
-
43
- st.markdown('''Instead of constructing two groups of “good” employees directly from the judgments of Managers A and B, you could weight their judgments against one another. For example, you could have two groups of employees, X and Y. Both X and Y contain the 50 employees that Managers A and B agree on. But group X contains 20 of Manager A’s preferred employees and 5 of Manager B’s, while group Y contains 20 of Manager B’s preferred employees and 5 of Manager A’s. Here again we have different groups of “good” employees, and so two different target variables.''')
44
-
45
- st.markdown('''One could select different groups of good employees in other ways still. An employer might have different metrics to evaluate employee success. Different employees might be better than others according to one metric compared to another. Depending on what importance is assigned to the different metrics—depending on how you weight the different metrics against one another—different groups of employees may emerge as “good.”''')
46
-
47
- st.markdown('''Our focus in the dashboard has been on hiring algorithms that are based on cognitive test games. There are other kinds of algorithms used in hiring—for example, algorithms that identify promising job applicants on the basis of their resumés. In designing any such algorithm, the target variable must be defined, and the notion of a “good employee” must be translated into algorithmic terms. And so the insights of this dashboard apply, and can be put into practice, for almost any kind of hiring algorithm you’re working with.''')