cassiomo commited on
Commit
8eea1d4
1 Parent(s): 4794d06

Upload 7 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ data/players_22.csv filter=lfs diff=lfs merge=lfs -text
data/Qatar_group_stage.csv ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ country1,country2,group
2
+ Qatar,Ecuador,a
3
+ Senegal,Netherlands,a
4
+ England,IR Iran,b
5
+ USA,Wales,b
6
+ France,Australia,d
7
+ Denmark,Tunisia,d
8
+ Mexico,Poland,c
9
+ Argentina,Saudi Arabia,c
10
+ Belgium,Canada,f
11
+ Spain,Costa Rica,e
12
+ Germany,Japan,e
13
+ Morocco,Croatia,f
14
+ Switzerland,Cameroon,g
15
+ Uruguay,Korea Republic,h
16
+ Portugal,Ghana,h
17
+ Brazil,Serbia,g
18
+ Wales,IR Iran,b
19
+ Qatar,Senegal,a
20
+ Netherlands,Ecuador,a
21
+ England,USA,b
22
+ Tunisia,Australia,d
23
+ Poland,Saudi Arabia,c
24
+ France,Denmark,d
25
+ Argentina,Mexico,c
26
+ Japan,Costa Rica,e
27
+ Belgium,Morocco,f
28
+ Croatia,Canada,f
29
+ Spain,Germany,e
30
+ Cameroon,Serbia,g
31
+ Korea Republic,Ghana,h
32
+ Brazil,Switzerland,g
33
+ Portugal,Uruguay,h
34
+ Wales,England,b
35
+ IR Iran,USA,b
36
+ Ecuador,Senegal,a
37
+ Netherlands,Qatar,a
38
+ Australia,Denmark,d
39
+ Tunisia,France,d
40
+ Poland,Argentina,c
41
+ Saudi Arabia,Mexico,c
42
+ Croatia,Belgium,f
43
+ Canada,Morocco,f
44
+ Japan,Spain,e
45
+ Costa Rica,Germany,e
46
+ Ghana,Uruguay,h
47
+ Korea Republic,Portugal,h
48
+ Serbia,Switzerland,g
49
+ Cameroon,Brazil,g
50
+ 1a,2b,s1
51
+ 1c,2d,s2
52
+ 1e,2f,t1
53
+ 1g,2h,t2
54
+ 1b,2a,u1
55
+ 1d,2c,u2
56
+ 1f,2e,v1
57
+ 1h,2g,v2
58
+ s1,s2,w1
59
+ t1,t2,w2
60
+ u1,u2,x1
61
+ v1,v2,x2
62
+ w1,w2,y1
63
+ x1,x2,y2
64
+ y1,y2,f
65
+ z1,z2,f3
data/international_matches.csv ADDED
The diff for this file is too large to render. See raw diff
 
data/last_team_scores.csv ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ team,date,rank,goalkeeper_score,defense_score,offense_score,midfield_score
2
+ Argentina,2022-06-05,4,84.0,82.0,89.0,84.0
3
+ Australia,2022-06-13,42,77.0,72.0,72.0,74.0
4
+ Belgium,2022-06-14,2,89.0,81.0,86.0,86.0
5
+ Brazil,2022-06-06,1,89.0,87.0,87.0,86.0
6
+ Cameroon,2022-06-09,37,67.0,77.0,78.0,75.0
7
+ Canada,2022-06-13,38,76.0,69.0,73.0,78.0
8
+ Costa Rica,2022-06-14,31,88.0,72.0,70.0,69.0
9
+ Croatia,2022-06-13,16,82.0,78.0,77.0,84.0
10
+ Denmark,2022-06-13,11,85.0,80.0,78.0,80.0
11
+ Ecuador,2022-06-11,46,71.0,74.0,76.0,74.0
12
+ England,2022-06-14,5,83.0,85.0,88.0,84.0
13
+ France,2022-06-13,3,87.0,84.0,87.0,86.0
14
+ Germany,2022-06-14,12,88.0,84.0,83.0,86.0
15
+ Ghana,2022-06-14,60,74.0,76.0,76.0,78.0
16
+ IR Iran,2022-06-12,21,73.0,69.0,75.0,69.0
17
+ Japan,2022-06-14,23,73.0,75.0,75.0,78.0
18
+ Korea Republic,2022-06-14,29,75.0,73.0,80.0,74.0
19
+ Mexico,2022-06-14,9,80.0,77.0,83.0,78.0
20
+ Morocco,2022-06-13,24,82.0,81.0,82.0,76.0
21
+ Netherlands,2022-06-14,10,81.0,85.0,83.0,84.0
22
+ Poland,2022-06-14,26,87.0,75.0,85.0,76.0
23
+ Portugal,2022-06-12,8,82.0,85.0,86.0,84.0
24
+ Qatar,2022-03-29,52,50.0,78.0,80.0,79.0
25
+ Saudi Arabia,2022-06-09,49,70.0,73.0,68.0,73.0
26
+ Senegal,2022-06-07,20,83.0,79.0,81.0,79.0
27
+ Serbia,2022-06-12,25,80.0,76.0,80.0,82.0
28
+ Spain,2022-06-12,7,84.0,86.0,85.0,86.0
29
+ Switzerland,2022-06-12,14,85.0,78.0,77.0,80.0
30
+ Tunisia,2022-06-14,35,64.0,71.0,72.0,74.0
31
+ USA,2022-06-14,15,77.0,76.0,78.0,76.0
32
+ Uruguay,2022-06-11,13,80.0,81.0,84.0,80.0
33
+ Wales,2022-06-14,18,74.0,75.0,73.0,78.0
data/players_22.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08b8390285ea9c514e07ea2e32426932afc548cd77b90eede1764748cf6e2d7c
3
+ size 13617425
data/squad_stats.csv ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ nationality_name,overall,potential
2
+ France,85.55,89.18
3
+ England,85.18,89.0
4
+ Spain,85.55,88.36
5
+ Brazil,85.64,88.09
6
+ Portugal,85.0,88.0
7
+ Germany,85.0,87.55
8
+ Netherlands,83.36,87.0
9
+ Argentina,83.73,86.18
10
+ Belgium,83.45,85.55
11
+ Uruguay,80.55,84.64
12
+ Croatia,80.27,84.36
13
+ USA,76.27,83.73
14
+ Denmark,79.91,83.18
15
+ Mexico,78.45,82.91
16
+ Senegal,79.73,82.82
17
+ Poland,79.18,82.82
18
+ Qatar,78.93035971223021,82.62913669064748
19
+ Switzerland,79.09,82.55
20
+ Serbia,77.45,81.64
21
+ Morocco,79.55,81.45
22
+ Japan,75.55,80.45
23
+ Ghana,76.27,80.09
24
+ Wales,75.36,80.0
25
+ Ecuador,73.09,79.91
26
+ Korea Republic,75.27,79.0
27
+ Cameroon,75.09,78.91
28
+ Canada,73.64,78.45
29
+ Australia,72.55,76.82
30
+ Tunisia,73.0,76.0
31
+ Saudi Arabia,71.45,75.27
32
+ Costa Rica,72.0,74.55
33
+ IR Iran,70.82,72.36
data/training.csv ADDED
The diff for this file is too large to render. See raw diff
 
dataprep.py ADDED
@@ -0,0 +1,570 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ import numpy as np
4
+ import pandas as pd
5
+ import matplotlib.pyplot as plt
6
+ import seaborn as sns
7
+ import os
8
+ import warnings
9
+
10
+ warnings.filterwarnings('ignore')
11
+
12
+
13
+ # In[90]:
14
+
15
+
16
+ # In[91]:
17
+
18
+ def main():
19
+ st.title("FIFA Data visualization")
20
+
21
+
22
+ df = pd.read_csv('./data/international_matches.csv', parse_dates=['date'])
23
+ # df.tail()
24
+ #
25
+
26
+ # In[92]:
27
+
28
+
29
+ # df.columns
30
+
31
+
32
+ # In[93]:
33
+
34
+
35
+ # df.isnull().sum()
36
+
37
+
38
+ # # PRE-ANALYSIS
39
+ # The dataset has a lot of blank fields that need to be fixed.
40
+ # However, before modifying any field, I want to analyze the teams' qualifications on the last FIFA date (June 2022). This is important because, from these qualifications, I will create the inference dataset that enters the machine learning algorithm that predicts the World Cup matches.
41
+
42
+ # ### Top 10 FIFA Ranking
43
+ # Top 10 national teams to date FIFA June 2022.
44
+ # **ref:** https://www.fifa.com/fifa-world-ranking/men?dateId=id13603
45
+
46
+
47
+ # In[94]:
48
+
49
+
50
+ fifa_rank = df[['date', 'home_team', 'away_team', 'home_team_fifa_rank', 'away_team_fifa_rank']]
51
+ home = fifa_rank[['date', 'home_team', 'home_team_fifa_rank']].rename(
52
+ columns={"home_team": "team", "home_team_fifa_rank": "rank"})
53
+ away = fifa_rank[['date', 'away_team', 'away_team_fifa_rank']].rename(
54
+ columns={"away_team": "team", "away_team_fifa_rank": "rank"})
55
+ fifa_rank = pd.concat([home, away])
56
+ # Select each country latest match
57
+ fifa_rank = fifa_rank.sort_values(['team', 'date'], ascending=[True, False])
58
+ last_rank = fifa_rank
59
+ fifa_rank_top10 = fifa_rank.groupby('team').first().sort_values('rank', ascending=True)[0:10].reset_index()
60
+
61
+
62
+ # fifa_rank_top10
63
+
64
+
65
+ # ### Top 10 teams with the highest winning percentage at home and away
66
+
67
+ # In[95]:
68
+
69
+
70
+ def home_percentage(team):
71
+ score = len(df[(df['home_team'] == team) & (df['home_team_result'] == "Win")]) / len(
72
+ df[df['home_team'] == team]) * 100
73
+ return round(score)
74
+
75
+
76
+ def away_percentage(team):
77
+ score = len(df[(df['away_team'] == team) & (df['home_team_result'] == "Lose")]) / len(
78
+ df[df['away_team'] == team]) * 100
79
+ return round(score)
80
+
81
+
82
+ # In[96]:
83
+
84
+
85
+ fifa_rank_top10['Home_win_Per'] = np.vectorize(home_percentage)(fifa_rank_top10['team'])
86
+ fifa_rank_top10['Away_win_Per'] = np.vectorize(away_percentage)(fifa_rank_top10['team'])
87
+ fifa_rank_top10['Average_win_Per'] = round((fifa_rank_top10['Home_win_Per'] + fifa_rank_top10['Away_win_Per']) / 2)
88
+ fifa_rank_win = fifa_rank_top10.sort_values('Average_win_Per', ascending=False)
89
+ # fifa_rank_win
90
+
91
+
92
+ # ### Top 10 attacking teams in the last FIFA date
93
+
94
+ # In[97]:
95
+
96
+
97
+ fifa_offense = df[['date', 'home_team', 'away_team', 'home_team_mean_offense_score', 'away_team_mean_offense_score']]
98
+ home = fifa_offense[['date', 'home_team', 'home_team_mean_offense_score']].rename(
99
+ columns={"home_team": "team", "home_team_mean_offense_score": "offense_score"})
100
+ away = fifa_offense[['date', 'away_team', 'away_team_mean_offense_score']].rename(
101
+ columns={"away_team": "team", "away_team_mean_offense_score": "offense_score"})
102
+ fifa_offense = pd.concat([home, away])
103
+ fifa_offense = fifa_offense.sort_values(['date', 'team'], ascending=[False, True])
104
+ last_offense = fifa_offense
105
+ fifa_offense_top10 = fifa_offense.groupby('team').first().sort_values('offense_score', ascending=False)[
106
+ 0:10].reset_index()
107
+ # fifa_offense_top10
108
+
109
+ import plotly.graph_objs as go
110
+ import plotly.figure_factory as ff
111
+
112
+ # In[99]:
113
+
114
+ # Display the data for the bar chart
115
+ st.write("Top 10 Attacking Teams")
116
+ st.write(fifa_offense_top10)
117
+
118
+ # Create a horizontal bar chart
119
+ fig_bar = go.Figure(data=[go.Bar(y=fifa_offense_top10['team'], x=fifa_offense_top10['offense_score'], orientation='h')])
120
+ # Update layout to include title, x-label, and y-label
121
+ fig_bar.update_layout(title='Top 10 Attacking Teams',
122
+ xaxis_title='Offense Score',
123
+ yaxis_title='Team')
124
+ st.plotly_chart(fig_bar)
125
+
126
+ # Display the data for the bar chart
127
+ # st.write("Top 10 Offense Teams")
128
+ # st.write(fifa_offense_top10)
129
+
130
+ # sns.barplot(data=fifa_offense_top10, x='offense_score', y='team', color="#7F1431")
131
+ # plt.xlabel('Offense Score', size = 20)
132
+ # plt.ylabel('Team', size = 20)
133
+ # plt.title("Top 10 Attacking teams");
134
+
135
+
136
+ # ### Top 10 Midfield teams in the last FIFA date
137
+
138
+ # In[100]:
139
+
140
+
141
+ fifa_midfield = df[['date', 'home_team', 'away_team', 'home_team_mean_midfield_score', 'away_team_mean_midfield_score']]
142
+ home = fifa_midfield[['date', 'home_team', 'home_team_mean_midfield_score']].rename(
143
+ columns={"home_team": "team", "home_team_mean_midfield_score": "midfield_score"})
144
+ away = fifa_midfield[['date', 'away_team', 'away_team_mean_midfield_score']].rename(
145
+ columns={"away_team": "team", "away_team_mean_midfield_score": "midfield_score"})
146
+ fifa_midfield = pd.concat([home, away])
147
+ fifa_midfield = fifa_midfield.sort_values(['date', 'team'], ascending=[False, True])
148
+ last_midfield = fifa_midfield
149
+ fifa_midfield_top10 = fifa_midfield.groupby('team').first().sort_values('midfield_score', ascending=False)[
150
+ 0:10].reset_index()
151
+ # fifa_midfield_top10
152
+
153
+
154
+ # In[101]:
155
+
156
+ # Display the data for the bar chart
157
+ st.write("Top 10 Midfield Teams")
158
+ st.write(fifa_midfield_top10)
159
+
160
+ # Create a horizontal bar chart
161
+ fig_bar = go.Figure(
162
+ data=[go.Bar(y=fifa_midfield_top10['team'], x=fifa_midfield_top10['midfield_score'], orientation='h')])
163
+ # Update layout to include title, x-label, and y-label
164
+ fig_bar.update_layout(title='Top 10 Midfield Teams', # Set the title
165
+ xaxis_title='Midfield Score', # Set the x-axis label
166
+ yaxis_title='Team') # Set the y-axis label
167
+
168
+ # Display the bar chart
169
+ st.plotly_chart(fig_bar)
170
+
171
+ # sns.barplot(data=fifa_midfield_top10, x='midfield_score', y='team', color="#7F1431")
172
+ # plt.xlabel('Midfield Score', size = 20)
173
+ # plt.ylabel('Team', size = 20)
174
+ # plt.title("Top 10 Midfield teams");
175
+
176
+
177
+ # ### Top 10 defending teams in the last FIFA date
178
+
179
+ # In[102]:
180
+
181
+
182
+ fifa_defense = df[['date', 'home_team', 'away_team', 'home_team_mean_defense_score', 'away_team_mean_defense_score']]
183
+ home = fifa_defense[['date', 'home_team', 'home_team_mean_defense_score']].rename(
184
+ columns={"home_team": "team", "home_team_mean_defense_score": "defense_score"})
185
+ away = fifa_defense[['date', 'away_team', 'away_team_mean_defense_score']].rename(
186
+ columns={"away_team": "team", "away_team_mean_defense_score": "defense_score"})
187
+ fifa_defense = pd.concat([home, away])
188
+ fifa_defense = fifa_defense.sort_values(['date', 'team'], ascending=[False, True])
189
+ last_defense = fifa_defense
190
+ fifa_defense_top10 = fifa_defense.groupby('team').first().sort_values('defense_score', ascending=False)[
191
+ 0:10].reset_index()
192
+ # fifa_defense_top10
193
+
194
+
195
+ # In[103]:
196
+
197
+ # Display the data for the bar chart
198
+ st.write("Top 10 Defensive Teams")
199
+ st.write(fifa_defense_top10)
200
+
201
+ # Create the horizontal bar chart
202
+ fig_bar = go.Figure(data=[go.Bar(y=fifa_defense_top10['team'], x=fifa_defense_top10['defense_score'], orientation='h')])
203
+
204
+ # Update layout to include title, x-label, and y-label
205
+ fig_bar.update_layout(title='Top 10 Defensive Teams', # Set the title
206
+ xaxis_title='Defense Score', # Set the x-axis label
207
+ yaxis_title='Team') # Set the y-axis label
208
+
209
+ # Display the bar chart
210
+ st.plotly_chart(fig_bar)
211
+
212
+ sns.barplot(data=fifa_defense_top10, x='defense_score', y='team', color="#7F1431")
213
+ plt.xlabel('Defense Score', size=20)
214
+ plt.ylabel('Team', size=20)
215
+ plt.title("Top 10 Defense Teams")
216
+
217
+ # ### Do Home teams have any advantage?
218
+
219
+ # In[104]:
220
+
221
+
222
+ # Select all matches played at non-neutral locations
223
+ home_team_advantage = df[df['neutral_location'] == False]['home_team_result'].value_counts(normalize=True)
224
+
225
+ # # Plot
226
+ # fig, axes = plt.subplots(1, 1, figsize=(8,8))
227
+ # ax =plt.pie(home_team_advantage ,labels = ['Win', 'Lose', 'Draw'], autopct='%.0f%%')
228
+ # plt.title('Home team match result', fontsize = 15)
229
+ # plt.show()
230
+
231
+
232
+ # As the graph shows, the home team has an advantage over the away team. This is due to factors such as the fans, the weather and the confidence of the players. For this reason, in the World Cup, those teams that sit at home will have an advantage.
233
+
234
+ # # DATA PREPARATION AND FEATURE ENGINEERING
235
+ # In this section, I will fill in the empty fields in the dataset and clean up the data for teams that did not qualify for the World Cup. Then, I will use the correlation matrix to choose the characteristics that will define the training dataset of the Machine Learning model. Finally, I will use the ratings of the teams in their last match to define the "Last Team Scores" dataset (i.e., the dataset that I will use to predict the World Cup matches).
236
+
237
+ # ### Analyze and fill na's
238
+
239
+ # In[105]:
240
+
241
+ #
242
+ # df.isnull().sum()
243
+
244
+
245
+ # In[106]:
246
+
247
+
248
+ # We can fill mean for na's in goal_keeper_score
249
+ df[df['home_team'] == "Brazil"]['home_team_goalkeeper_score'].describe()
250
+
251
+ # In[107]:
252
+
253
+
254
+ df['home_team_goalkeeper_score'] = round(
255
+ df.groupby("home_team")["home_team_goalkeeper_score"].transform(lambda x: x.fillna(x.mean())))
256
+ df['away_team_goalkeeper_score'] = round(
257
+ df.groupby("away_team")["away_team_goalkeeper_score"].transform(lambda x: x.fillna(x.mean())))
258
+
259
+ # In[108]:
260
+
261
+
262
+ # We can fill mean for na's in defense score
263
+ df[df['away_team'] == "Uruguay"]['home_team_mean_defense_score'].describe()
264
+
265
+ # In[65]:
266
+
267
+
268
+ df['home_team_mean_defense_score'] = round(
269
+ df.groupby('home_team')['home_team_mean_defense_score'].transform(lambda x: x.fillna(x.mean())))
270
+ df['away_team_mean_defense_score'] = round(
271
+ df.groupby('away_team')['away_team_mean_defense_score'].transform(lambda x: x.fillna(x.mean())))
272
+
273
+ # In[109]:
274
+
275
+
276
+ # We can fill mean for na's in offense score
277
+ df[df['away_team'] == "Uruguay"]['home_team_mean_offense_score'].describe()
278
+
279
+ # In[67]:
280
+
281
+
282
+ df['home_team_mean_offense_score'] = round(
283
+ df.groupby('home_team')['home_team_mean_offense_score'].transform(lambda x: x.fillna(x.mean())))
284
+ df['away_team_mean_offense_score'] = round(
285
+ df.groupby('away_team')['away_team_mean_offense_score'].transform(lambda x: x.fillna(x.mean())))
286
+
287
+ # In[110]:
288
+
289
+
290
+ # We can fill mean for na's in midfield score
291
+ df[df['away_team'] == "Uruguay"]['home_team_mean_midfield_score'].describe()
292
+
293
+ # In[111]:
294
+
295
+
296
+ df['home_team_mean_midfield_score'] = round(
297
+ df.groupby('home_team')['home_team_mean_midfield_score'].transform(lambda x: x.fillna(x.mean())))
298
+ df['away_team_mean_midfield_score'] = round(
299
+ df.groupby('away_team')['away_team_mean_midfield_score'].transform(lambda x: x.fillna(x.mean())))
300
+
301
+ # In[112]:
302
+
303
+
304
+ df.isnull().sum()
305
+
306
+ # In[113]:
307
+
308
+
309
+ # Teams are not available in FIFA game itself, so they are not less than average performing teams, so giving a average score of 50 for all.
310
+ df.fillna(50, inplace=True)
311
+
312
+ # ### Filter the teams participating in QATAR - World cup 2022
313
+
314
+ # In[115]:
315
+
316
+
317
+ list_2022 = ['Qatar', 'Germany', 'Denmark', 'Brazil', 'France', 'Belgium', 'Croatia', 'Spain', 'Serbia', 'England',
318
+ 'Switzerland', 'Netherlands', 'Argentina', 'IR Iran', 'Korea Republic', 'Japan', 'Saudi Arabia', 'Ecuador',
319
+ 'Uruguay', 'Canada', 'Ghana', 'Senegal', 'Portugal', 'Poland', 'Tunisia', 'Morocco', 'Cameroon', 'USA',
320
+ 'Mexico', 'Wales', 'Australia', 'Costa Rica']
321
+ final_df = df[(df["home_team"].apply(lambda x: x in list_2022)) | (df["away_team"].apply(lambda x: x in list_2022))]
322
+
323
+ # **Top 10 teams in QATAR 2022**
324
+
325
+ # In[116]:
326
+
327
+
328
+ rank = final_df[['date', 'home_team', 'away_team', 'home_team_fifa_rank', 'away_team_fifa_rank']]
329
+ home = rank[['date', 'home_team', 'home_team_fifa_rank']].rename(
330
+ columns={"home_team": "team", "home_team_fifa_rank": "rank"})
331
+ away = rank[['date', 'away_team', 'away_team_fifa_rank']].rename(
332
+ columns={"away_team": "team", "away_team_fifa_rank": "rank"})
333
+ rank = pd.concat([home, away])
334
+
335
+ # Select each country latest match
336
+ rank = rank.sort_values(['team', 'date'], ascending=[True, False])
337
+ rank_top10 = rank.groupby('team').first().sort_values('rank', ascending=True).reset_index()
338
+ rank_top10 = rank_top10[(rank_top10["team"].apply(lambda x: x in list_2022))][0:10]
339
+
340
+ st.write("Top 10 Countries by Rank - Latest Match")
341
+ rank_top10
342
+
343
+ # # Create a scatter plot
344
+ # fig_scatter = go.Figure(data=go.Scatter(x=rank_top10['team'], y=rank_top10['rank'], mode='markers', marker=dict(color='lightskyblue', size=12)))
345
+ #
346
+ # # Update layout to include title and labels
347
+ # fig_scatter.update_layout(title='Top 10 Countries by Rank - Latest Match',
348
+ # xaxis_title='Country',
349
+ # yaxis_title='Rank')
350
+ #
351
+ # # Display the scatter plot
352
+ # st.plotly_chart(fig_scatter)
353
+
354
+ # **Top 10 teams with the highest winning percentage in QATAR 2022**
355
+
356
+ # In[117]:
357
+
358
+
359
+ rank_top10['Home_win_Per'] = np.vectorize(home_percentage)(rank_top10['team'])
360
+ rank_top10['Away_win_Per'] = np.vectorize(away_percentage)(rank_top10['team'])
361
+ rank_top10['Average_win_Per'] = round((rank_top10['Home_win_Per'] + rank_top10['Away_win_Per']) / 2)
362
+ rank_top10_Win = rank_top10.sort_values('Average_win_Per', ascending=False)
363
+
364
+ # st.write("Top 10 Countries by Rank - Latest Match")
365
+ # rank_top10_Win
366
+
367
+
368
+ # In[118]:
369
+
370
+ # Display the data for the bar chart
371
+ st.write("Top 10 Average Win Per game Teams")
372
+ st.write(rank_top10_Win)
373
+
374
+ # Create a horizontal bar chart
375
+ # Create a horizontal bar chart
376
+ fig_bar = go.Figure(data=[go.Bar(y=rank_top10_Win['team'], x=rank_top10_Win['Average_win_Per'], orientation='h')])
377
+
378
+ # Update layout to include title and labels
379
+ fig_bar.update_layout(title='Top 10 Countries by Average Win Percentage',
380
+ xaxis_title='Average Win Percentage',
381
+ yaxis_title='Country')
382
+
383
+ # Display the horizontal bar chart
384
+ st.plotly_chart(fig_bar)
385
+
386
+ # sns.barplot(data=rank_top10_Win,x='Average_win_Per',y='team',color="#7F1431")
387
+ # plt.xticks()
388
+ # plt.xlabel('Win Average', size = 20)
389
+ # plt.ylabel('Team', size = 20)
390
+ # plt.title('Top 10 QATAR 2022 teams with the highest winning percentage')
391
+
392
+ #
393
+ # # ### Correlation Matrix
394
+ #
395
+ # # In[124]:
396
+ #
397
+ #
398
+ # final_df['home_team_result'].values
399
+ # # for index, value in final_df['home_team_result'].items():
400
+ # # print(f"Row {index}: {value}")
401
+ #
402
+ #
403
+ # # In[125]:
404
+ #
405
+ #
406
+ # team_result_df = final_df
407
+ # # for index, value in team_result_df['home_team_result'].items():
408
+ # # print(f"Row {index}: {value}")
409
+ #
410
+ #
411
+ # # In[151]:
412
+ #
413
+ #
414
+ # # Mapping numeric values for home_team_result to find the correleations
415
+ # final_df['home_team_result'] = final_df['home_team_result'].map({'Win':1, 'Draw':2, 'Lose':0})
416
+ #
417
+ #
418
+ # # In[145]:
419
+ #
420
+ #
421
+ #
422
+ #
423
+ #
424
+ # # In[150]:
425
+ #
426
+ #
427
+ # final_df['home_team_result'].head(1)
428
+ #
429
+ #
430
+ # # In[152]:
431
+ #
432
+ #
433
+ # final_df['home_team_result'] = pd.to_numeric(final_df['home_team_result'], errors='coerce')
434
+ #
435
+ #
436
+ # # In[155]:
437
+ #
438
+ #
439
+ # # df.head()
440
+ #
441
+ #
442
+ # # In[156]:
443
+ #
444
+ #
445
+ # # final_df.head()
446
+ #
447
+ #
448
+ # # In[157]:
449
+ #
450
+ #
451
+ # numerical_df = final_df.select_dtypes(include=['number'])
452
+ #
453
+ #
454
+ # # In[158]:
455
+ #
456
+ #
457
+ # numerical_df.corr()['home_team_result'].sort_values(ascending=False)
458
+ #
459
+ #
460
+ # # In[153]:
461
+ #
462
+ #
463
+ # # final_df.corr()['home_team_result'].sort_values(ascending=False)
464
+ #
465
+ #
466
+ # # Dropping unnecessary colums.
467
+ #
468
+ # # In[ ]:
469
+ #
470
+ #
471
+ # #Dropping unnecessary colums
472
+ # final_df = final_df.drop(['date', 'home_team_continent', 'away_team_continent', 'home_team_total_fifa_points', 'away_team_total_fifa_points', 'home_team_score', 'away_team_score', 'tournament', 'city', 'country', 'neutral_location', 'shoot_out'],axis=1)
473
+ #
474
+ #
475
+ # # In[ ]:
476
+ #
477
+ #
478
+ # # final_df.columns
479
+ #
480
+ #
481
+ # # In[ ]:
482
+ #
483
+ #
484
+ # # Change column names
485
+ # final_df.rename(columns={"home_team":"Team1", "away_team":"Team2", "home_team_fifa_rank":"Team1_FIFA_RANK",
486
+ # "away_team_fifa_rank":"Team2_FIFA_RANK", "home_team_result":"Team1_Result", "home_team_goalkeeper_score":"Team1_Goalkeeper_Score",
487
+ # "away_team_goalkeeper_score":"Team2_Goalkeeper_Score", "home_team_mean_defense_score":"Team1_Defense",
488
+ # "home_team_mean_offense_score":"Team1_Offense", "home_team_mean_midfield_score":"Team1_Midfield",
489
+ # "away_team_mean_defense_score":"Team2_Defense", "away_team_mean_offense_score":"Team2_Offense",
490
+ # "away_team_mean_midfield_score":"Team2_Midfield"}, inplace=True)
491
+ #
492
+ #
493
+ # # In[ ]:
494
+ #
495
+ #
496
+ # plt.figure(figsize=(10, 4), dpi=200)
497
+ # sns.heatmap(final_df.corr(), annot=True)
498
+ #
499
+ #
500
+ # # In[ ]:
501
+ #
502
+ #
503
+ # # final_df.info()
504
+ #
505
+ #
506
+ # # In[ ]:
507
+ #
508
+ #
509
+ # # final_df
510
+ #
511
+ #
512
+ # # Exporting the training dataset.
513
+ #
514
+ # # In[ ]:
515
+ #
516
+ #
517
+ # # final_df.to_csv("./data/training.csv", index = False)
518
+ #
519
+ #
520
+ # # ### Creating "Last Team Scores" dataset
521
+ # # This dataset contains the qualifications of each team on the previous FIFA date and will be used to predict the World Cup matches.
522
+ #
523
+ # # In[ ]:
524
+ #
525
+ #
526
+ # last_goalkeeper = df[['date', 'home_team', 'away_team', 'home_team_goalkeeper_score', 'away_team_goalkeeper_score']]
527
+ # home = last_goalkeeper[['date', 'home_team', 'home_team_goalkeeper_score']].rename(columns={"home_team":"team", "home_team_goalkeeper_score":"goalkeeper_score"})
528
+ # away = last_goalkeeper[['date', 'away_team', 'away_team_goalkeeper_score']].rename(columns={"away_team":"team", "away_team_goalkeeper_score":"goalkeeper_score"})
529
+ # last_goalkeeper = pd.concat([home,away])
530
+ #
531
+ # last_goalkeeper = last_goalkeeper.sort_values(['date', 'team'],ascending=[False, True])
532
+ #
533
+ # list_2022 = ['Qatar', 'Germany', 'Denmark', 'Brazil', 'France', 'Belgium', 'Croatia', 'Spain', 'Serbia', 'England', 'Switzerland', 'Netherlands', 'Argentina', 'IR Iran', 'Korea Republic', 'Japan', 'Saudi Arabia', 'Ecuador', 'Uruguay', 'Canada', 'Ghana', 'Senegal', 'Portugal', 'Poland', 'Tunisia', 'Morocco', 'Cameroon', 'USA', 'Mexico', 'Wales', 'Australia', 'Costa Rica']
534
+ #
535
+ # rank_qatar = last_rank[(last_rank["team"].apply(lambda x: x in list_2022))]
536
+ # rank_qatar = rank_qatar.groupby('team').first().reset_index()
537
+ # goal_qatar = last_goalkeeper[(last_goalkeeper["team"].apply(lambda x: x in list_2022))]
538
+ # goal_qatar = goal_qatar.groupby('team').first().reset_index()
539
+ # goal_qatar = goal_qatar.drop(['date'], axis = 1)
540
+ # off_qatar = last_offense[(last_offense["team"].apply(lambda x: x in list_2022))]
541
+ # off_qatar = off_qatar.groupby('team').first().reset_index()
542
+ # off_qatar = off_qatar.drop(['date'], axis = 1)
543
+ # mid_qatar = last_midfield[(last_midfield["team"].apply(lambda x: x in list_2022))]
544
+ # mid_qatar = mid_qatar.groupby('team').first().reset_index()
545
+ # mid_qatar = mid_qatar.drop(['date'], axis = 1)
546
+ # def_qatar = last_defense[(last_defense["team"].apply(lambda x: x in list_2022))]
547
+ # def_qatar = def_qatar.groupby('team').first().reset_index()
548
+ # def_qatar = def_qatar.drop(['date'], axis = 1)
549
+ #
550
+ # qatar = pd.merge(rank_qatar, goal_qatar, on = 'team')
551
+ # qatar = pd.merge(qatar, def_qatar, on ='team')
552
+ # qatar = pd.merge(qatar, off_qatar, on ='team')
553
+ # qatar = pd.merge(qatar, mid_qatar, on ='team')
554
+ #
555
+ # qatar['goalkeeper_score'] = round(qatar["goalkeeper_score"].transform(lambda x: x.fillna(x.mean())))
556
+ # qatar['offense_score'] = round(qatar["offense_score"].transform(lambda x: x.fillna(x.mean())))
557
+ # qatar['midfield_score'] = round(qatar["midfield_score"].transform(lambda x: x.fillna(x.mean())))
558
+ # qatar['defense_score'] = round(qatar["defense_score"].transform(lambda x: x.fillna(x.mean())))
559
+ # # qatar.head(5)
560
+ #
561
+ #
562
+ # # Exporting the "Last Team Scores" dataset.
563
+ #
564
+ # # In[ ]:
565
+ #
566
+
567
+ # qatar.to_csv("/content/drive/MyDrive/data/last_team_scores.csv", index = False)
568
+
569
+ if __name__ == "__main__":
570
+ main()