Priyanka-Kumavat commited on
Commit
2f4d8e7
1 Parent(s): 0efd0cb

Upload 5 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ input_raw_data.xlsx filter=lfs diff=lfs merge=lfs -text
AajTak_Model.ipynb ADDED
@@ -0,0 +1,2098 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "id": "5849e5b1",
7
+ "metadata": {},
8
+ "outputs": [],
9
+ "source": [
10
+ "# import required packages\n",
11
+ "\n",
12
+ "import pandas as pd\n",
13
+ "import numpy as np\n",
14
+ "import matplotlib as plt\n",
15
+ "import seaborn as sns\n",
16
+ "\n",
17
+ "from sklearn.model_selection import RandomizedSearchCV, GridSearchCV, train_test_split\n",
18
+ "from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor\n",
19
+ "from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error\n",
20
+ "from sklearn.preprocessing import LabelEncoder\n",
21
+ "\n",
22
+ "import warnings\n",
23
+ "warnings.filterwarnings('ignore')"
24
+ ]
25
+ },
26
+ {
27
+ "cell_type": "markdown",
28
+ "id": "6b77e2c3",
29
+ "metadata": {},
30
+ "source": [
31
+ "## Preporcessing"
32
+ ]
33
+ },
34
+ {
35
+ "cell_type": "code",
36
+ "execution_count": 2,
37
+ "id": "3725e933",
38
+ "metadata": {},
39
+ "outputs": [
40
+ {
41
+ "data": {
42
+ "text/html": [
43
+ "<div>\n",
44
+ "<style scoped>\n",
45
+ " .dataframe tbody tr th:only-of-type {\n",
46
+ " vertical-align: middle;\n",
47
+ " }\n",
48
+ "\n",
49
+ " .dataframe tbody tr th {\n",
50
+ " vertical-align: top;\n",
51
+ " }\n",
52
+ "\n",
53
+ " .dataframe thead th {\n",
54
+ " text-align: right;\n",
55
+ " }\n",
56
+ "</style>\n",
57
+ "<table border=\"1\" class=\"dataframe\">\n",
58
+ " <thead>\n",
59
+ " <tr style=\"text-align: right;\">\n",
60
+ " <th></th>\n",
61
+ " <th>Unnamed: 0</th>\n",
62
+ " <th>Channel</th>\n",
63
+ " <th>Week Day</th>\n",
64
+ " <th>TimeBand</th>\n",
65
+ " <th>Share</th>\n",
66
+ " <th>AMA</th>\n",
67
+ " <th>rate</th>\n",
68
+ " <th>daily reach</th>\n",
69
+ " <th>cume reach</th>\n",
70
+ " <th>ATS</th>\n",
71
+ " <th>Unrolled</th>\n",
72
+ " </tr>\n",
73
+ " </thead>\n",
74
+ " <tbody>\n",
75
+ " <tr>\n",
76
+ " <th>0</th>\n",
77
+ " <td>7'23</td>\n",
78
+ " <td>Aaj Tak</td>\n",
79
+ " <td>Saturday</td>\n",
80
+ " <td>02:00:00 - 02:30:00</td>\n",
81
+ " <td>0.081305</td>\n",
82
+ " <td>0.123363</td>\n",
83
+ " <td>0.000433</td>\n",
84
+ " <td>3.70</td>\n",
85
+ " <td>3.700893</td>\n",
86
+ " <td>00:01:00</td>\n",
87
+ " <td>0.000000</td>\n",
88
+ " </tr>\n",
89
+ " <tr>\n",
90
+ " <th>1</th>\n",
91
+ " <td>7'23</td>\n",
92
+ " <td>Aaj Tak</td>\n",
93
+ " <td>Saturday</td>\n",
94
+ " <td>02:30:00 - 03:00:00</td>\n",
95
+ " <td>0.469995</td>\n",
96
+ " <td>0.394070</td>\n",
97
+ " <td>0.001383</td>\n",
98
+ " <td>11.82</td>\n",
99
+ " <td>11.822103</td>\n",
100
+ " <td>00:01:00</td>\n",
101
+ " <td>0.000000</td>\n",
102
+ " </tr>\n",
103
+ " <tr>\n",
104
+ " <th>2</th>\n",
105
+ " <td>7'23</td>\n",
106
+ " <td>Aaj Tak</td>\n",
107
+ " <td>Saturday</td>\n",
108
+ " <td>03:00:00 - 03:30:00</td>\n",
109
+ " <td>1.723084</td>\n",
110
+ " <td>0.361537</td>\n",
111
+ " <td>0.001269</td>\n",
112
+ " <td>10.85</td>\n",
113
+ " <td>10.846120</td>\n",
114
+ " <td>00:01:00</td>\n",
115
+ " <td>0.000000</td>\n",
116
+ " </tr>\n",
117
+ " <tr>\n",
118
+ " <th>3</th>\n",
119
+ " <td>7'23</td>\n",
120
+ " <td>Aaj Tak</td>\n",
121
+ " <td>Saturday</td>\n",
122
+ " <td>03:30:00 - 04:00:00</td>\n",
123
+ " <td>2.019206</td>\n",
124
+ " <td>0.251790</td>\n",
125
+ " <td>0.000884</td>\n",
126
+ " <td>7.55</td>\n",
127
+ " <td>7.553692</td>\n",
128
+ " <td>00:01:00</td>\n",
129
+ " <td>0.000000</td>\n",
130
+ " </tr>\n",
131
+ " <tr>\n",
132
+ " <th>4</th>\n",
133
+ " <td>7'23</td>\n",
134
+ " <td>Aaj Tak</td>\n",
135
+ " <td>Saturday</td>\n",
136
+ " <td>04:00:00 - 04:30:00</td>\n",
137
+ " <td>1.163916</td>\n",
138
+ " <td>0.333603</td>\n",
139
+ " <td>0.001171</td>\n",
140
+ " <td>10.01</td>\n",
141
+ " <td>10.008100</td>\n",
142
+ " <td>00:01:00</td>\n",
143
+ " <td>0.000000</td>\n",
144
+ " </tr>\n",
145
+ " <tr>\n",
146
+ " <th>...</th>\n",
147
+ " <td>...</td>\n",
148
+ " <td>...</td>\n",
149
+ " <td>...</td>\n",
150
+ " <td>...</td>\n",
151
+ " <td>...</td>\n",
152
+ " <td>...</td>\n",
153
+ " <td>...</td>\n",
154
+ " <td>...</td>\n",
155
+ " <td>...</td>\n",
156
+ " <td>...</td>\n",
157
+ " <td>...</td>\n",
158
+ " </tr>\n",
159
+ " <tr>\n",
160
+ " <th>12091</th>\n",
161
+ " <td>15'23</td>\n",
162
+ " <td>Aaj Tak</td>\n",
163
+ " <td>Friday</td>\n",
164
+ " <td>23:30:00 - 24:00:00</td>\n",
165
+ " <td>0.315975</td>\n",
166
+ " <td>6.315608</td>\n",
167
+ " <td>0.028382</td>\n",
168
+ " <td>52.33</td>\n",
169
+ " <td>52.334241</td>\n",
170
+ " <td>00:03:37</td>\n",
171
+ " <td>1.870176</td>\n",
172
+ " </tr>\n",
173
+ " <tr>\n",
174
+ " <th>12092</th>\n",
175
+ " <td>15'23</td>\n",
176
+ " <td>Aaj Tak</td>\n",
177
+ " <td>Friday</td>\n",
178
+ " <td>24:00:00 - 24:30:00</td>\n",
179
+ " <td>0.690376</td>\n",
180
+ " <td>8.010992</td>\n",
181
+ " <td>0.036001</td>\n",
182
+ " <td>33.65</td>\n",
183
+ " <td>33.651447</td>\n",
184
+ " <td>00:07:09</td>\n",
185
+ " <td>6.204409</td>\n",
186
+ " </tr>\n",
187
+ " <tr>\n",
188
+ " <th>12093</th>\n",
189
+ " <td>15'23</td>\n",
190
+ " <td>Aaj Tak</td>\n",
191
+ " <td>Friday</td>\n",
192
+ " <td>24:30:00 - 25:00:00</td>\n",
193
+ " <td>1.313761</td>\n",
194
+ " <td>8.575085</td>\n",
195
+ " <td>0.038536</td>\n",
196
+ " <td>26.97</td>\n",
197
+ " <td>26.974041</td>\n",
198
+ " <td>00:09:32</td>\n",
199
+ " <td>6.526442</td>\n",
200
+ " </tr>\n",
201
+ " <tr>\n",
202
+ " <th>12094</th>\n",
203
+ " <td>15'23</td>\n",
204
+ " <td>Aaj Tak</td>\n",
205
+ " <td>Friday</td>\n",
206
+ " <td>25:00:00 - 25:30:00</td>\n",
207
+ " <td>1.141046</td>\n",
208
+ " <td>4.483507</td>\n",
209
+ " <td>0.020149</td>\n",
210
+ " <td>37.21</td>\n",
211
+ " <td>37.214790</td>\n",
212
+ " <td>00:03:37</td>\n",
213
+ " <td>5.011646</td>\n",
214
+ " </tr>\n",
215
+ " <tr>\n",
216
+ " <th>12095</th>\n",
217
+ " <td>15'23</td>\n",
218
+ " <td>Aaj Tak</td>\n",
219
+ " <td>Friday</td>\n",
220
+ " <td>25:30:00 - 26:00:00</td>\n",
221
+ " <td>0.000000</td>\n",
222
+ " <td>0.000000</td>\n",
223
+ " <td>0.000000</td>\n",
224
+ " <td>0.00</td>\n",
225
+ " <td>0.000000</td>\n",
226
+ " <td>0</td>\n",
227
+ " <td>0.000000</td>\n",
228
+ " </tr>\n",
229
+ " </tbody>\n",
230
+ "</table>\n",
231
+ "<p>12096 rows × 11 columns</p>\n",
232
+ "</div>"
233
+ ],
234
+ "text/plain": [
235
+ " Unnamed: 0 Channel Week Day TimeBand Share AMA \\\n",
236
+ "0 7'23 Aaj Tak Saturday 02:00:00 - 02:30:00 0.081305 0.123363 \n",
237
+ "1 7'23 Aaj Tak Saturday 02:30:00 - 03:00:00 0.469995 0.394070 \n",
238
+ "2 7'23 Aaj Tak Saturday 03:00:00 - 03:30:00 1.723084 0.361537 \n",
239
+ "3 7'23 Aaj Tak Saturday 03:30:00 - 04:00:00 2.019206 0.251790 \n",
240
+ "4 7'23 Aaj Tak Saturday 04:00:00 - 04:30:00 1.163916 0.333603 \n",
241
+ "... ... ... ... ... ... ... \n",
242
+ "12091 15'23 Aaj Tak Friday 23:30:00 - 24:00:00 0.315975 6.315608 \n",
243
+ "12092 15'23 Aaj Tak Friday 24:00:00 - 24:30:00 0.690376 8.010992 \n",
244
+ "12093 15'23 Aaj Tak Friday 24:30:00 - 25:00:00 1.313761 8.575085 \n",
245
+ "12094 15'23 Aaj Tak Friday 25:00:00 - 25:30:00 1.141046 4.483507 \n",
246
+ "12095 15'23 Aaj Tak Friday 25:30:00 - 26:00:00 0.000000 0.000000 \n",
247
+ "\n",
248
+ " rate daily reach cume reach ATS Unrolled \n",
249
+ "0 0.000433 3.70 3.700893 00:01:00 0.000000 \n",
250
+ "1 0.001383 11.82 11.822103 00:01:00 0.000000 \n",
251
+ "2 0.001269 10.85 10.846120 00:01:00 0.000000 \n",
252
+ "3 0.000884 7.55 7.553692 00:01:00 0.000000 \n",
253
+ "4 0.001171 10.01 10.008100 00:01:00 0.000000 \n",
254
+ "... ... ... ... ... ... \n",
255
+ "12091 0.028382 52.33 52.334241 00:03:37 1.870176 \n",
256
+ "12092 0.036001 33.65 33.651447 00:07:09 6.204409 \n",
257
+ "12093 0.038536 26.97 26.974041 00:09:32 6.526442 \n",
258
+ "12094 0.020149 37.21 37.214790 00:03:37 5.011646 \n",
259
+ "12095 0.000000 0.00 0.000000 0 0.000000 \n",
260
+ "\n",
261
+ "[12096 rows x 11 columns]"
262
+ ]
263
+ },
264
+ "execution_count": 2,
265
+ "metadata": {},
266
+ "output_type": "execute_result"
267
+ }
268
+ ],
269
+ "source": [
270
+ "# read the dataset\n",
271
+ "\n",
272
+ "df = pd.read_excel(\"input_raw_data.xlsx\")\n",
273
+ "df"
274
+ ]
275
+ },
276
+ {
277
+ "cell_type": "code",
278
+ "execution_count": 3,
279
+ "id": "cc260fc7",
280
+ "metadata": {},
281
+ "outputs": [],
282
+ "source": [
283
+ "df.rename(columns={'Unnamed: 0':'Week number'}, inplace=True)"
284
+ ]
285
+ },
286
+ {
287
+ "cell_type": "code",
288
+ "execution_count": 4,
289
+ "id": "bfee3282",
290
+ "metadata": {},
291
+ "outputs": [
292
+ {
293
+ "data": {
294
+ "text/html": [
295
+ "<div>\n",
296
+ "<style scoped>\n",
297
+ " .dataframe tbody tr th:only-of-type {\n",
298
+ " vertical-align: middle;\n",
299
+ " }\n",
300
+ "\n",
301
+ " .dataframe tbody tr th {\n",
302
+ " vertical-align: top;\n",
303
+ " }\n",
304
+ "\n",
305
+ " .dataframe thead th {\n",
306
+ " text-align: right;\n",
307
+ " }\n",
308
+ "</style>\n",
309
+ "<table border=\"1\" class=\"dataframe\">\n",
310
+ " <thead>\n",
311
+ " <tr style=\"text-align: right;\">\n",
312
+ " <th></th>\n",
313
+ " <th>Week number</th>\n",
314
+ " <th>Channel</th>\n",
315
+ " <th>Week Day</th>\n",
316
+ " <th>TimeBand</th>\n",
317
+ " <th>Share</th>\n",
318
+ " <th>AMA</th>\n",
319
+ " <th>rate</th>\n",
320
+ " <th>daily reach</th>\n",
321
+ " <th>cume reach</th>\n",
322
+ " <th>ATS</th>\n",
323
+ " <th>Unrolled</th>\n",
324
+ " </tr>\n",
325
+ " </thead>\n",
326
+ " <tbody>\n",
327
+ " <tr>\n",
328
+ " <th>0</th>\n",
329
+ " <td>7'23</td>\n",
330
+ " <td>Aaj Tak</td>\n",
331
+ " <td>Saturday</td>\n",
332
+ " <td>02:00:00 - 02:30:00</td>\n",
333
+ " <td>0.081305</td>\n",
334
+ " <td>0.123363</td>\n",
335
+ " <td>0.000433</td>\n",
336
+ " <td>3.70</td>\n",
337
+ " <td>3.700893</td>\n",
338
+ " <td>00:01:00</td>\n",
339
+ " <td>0.0</td>\n",
340
+ " </tr>\n",
341
+ " <tr>\n",
342
+ " <th>1</th>\n",
343
+ " <td>7'23</td>\n",
344
+ " <td>Aaj Tak</td>\n",
345
+ " <td>Saturday</td>\n",
346
+ " <td>02:30:00 - 03:00:00</td>\n",
347
+ " <td>0.469995</td>\n",
348
+ " <td>0.394070</td>\n",
349
+ " <td>0.001383</td>\n",
350
+ " <td>11.82</td>\n",
351
+ " <td>11.822103</td>\n",
352
+ " <td>00:01:00</td>\n",
353
+ " <td>0.0</td>\n",
354
+ " </tr>\n",
355
+ " <tr>\n",
356
+ " <th>2</th>\n",
357
+ " <td>7'23</td>\n",
358
+ " <td>Aaj Tak</td>\n",
359
+ " <td>Saturday</td>\n",
360
+ " <td>03:00:00 - 03:30:00</td>\n",
361
+ " <td>1.723084</td>\n",
362
+ " <td>0.361537</td>\n",
363
+ " <td>0.001269</td>\n",
364
+ " <td>10.85</td>\n",
365
+ " <td>10.846120</td>\n",
366
+ " <td>00:01:00</td>\n",
367
+ " <td>0.0</td>\n",
368
+ " </tr>\n",
369
+ " <tr>\n",
370
+ " <th>3</th>\n",
371
+ " <td>7'23</td>\n",
372
+ " <td>Aaj Tak</td>\n",
373
+ " <td>Saturday</td>\n",
374
+ " <td>03:30:00 - 04:00:00</td>\n",
375
+ " <td>2.019206</td>\n",
376
+ " <td>0.251790</td>\n",
377
+ " <td>0.000884</td>\n",
378
+ " <td>7.55</td>\n",
379
+ " <td>7.553692</td>\n",
380
+ " <td>00:01:00</td>\n",
381
+ " <td>0.0</td>\n",
382
+ " </tr>\n",
383
+ " <tr>\n",
384
+ " <th>4</th>\n",
385
+ " <td>7'23</td>\n",
386
+ " <td>Aaj Tak</td>\n",
387
+ " <td>Saturday</td>\n",
388
+ " <td>04:00:00 - 04:30:00</td>\n",
389
+ " <td>1.163916</td>\n",
390
+ " <td>0.333603</td>\n",
391
+ " <td>0.001171</td>\n",
392
+ " <td>10.01</td>\n",
393
+ " <td>10.008100</td>\n",
394
+ " <td>00:01:00</td>\n",
395
+ " <td>0.0</td>\n",
396
+ " </tr>\n",
397
+ " </tbody>\n",
398
+ "</table>\n",
399
+ "</div>"
400
+ ],
401
+ "text/plain": [
402
+ " Week number Channel Week Day TimeBand Share AMA \\\n",
403
+ "0 7'23 Aaj Tak Saturday 02:00:00 - 02:30:00 0.081305 0.123363 \n",
404
+ "1 7'23 Aaj Tak Saturday 02:30:00 - 03:00:00 0.469995 0.394070 \n",
405
+ "2 7'23 Aaj Tak Saturday 03:00:00 - 03:30:00 1.723084 0.361537 \n",
406
+ "3 7'23 Aaj Tak Saturday 03:30:00 - 04:00:00 2.019206 0.251790 \n",
407
+ "4 7'23 Aaj Tak Saturday 04:00:00 - 04:30:00 1.163916 0.333603 \n",
408
+ "\n",
409
+ " rate daily reach cume reach ATS Unrolled \n",
410
+ "0 0.000433 3.70 3.700893 00:01:00 0.0 \n",
411
+ "1 0.001383 11.82 11.822103 00:01:00 0.0 \n",
412
+ "2 0.001269 10.85 10.846120 00:01:00 0.0 \n",
413
+ "3 0.000884 7.55 7.553692 00:01:00 0.0 \n",
414
+ "4 0.001171 10.01 10.008100 00:01:00 0.0 "
415
+ ]
416
+ },
417
+ "execution_count": 4,
418
+ "metadata": {},
419
+ "output_type": "execute_result"
420
+ }
421
+ ],
422
+ "source": [
423
+ "df.head()"
424
+ ]
425
+ },
426
+ {
427
+ "cell_type": "code",
428
+ "execution_count": 5,
429
+ "id": "e53ee7c9",
430
+ "metadata": {},
431
+ "outputs": [
432
+ {
433
+ "name": "stdout",
434
+ "output_type": "stream",
435
+ "text": [
436
+ "<class 'pandas.core.frame.DataFrame'>\n",
437
+ "RangeIndex: 12096 entries, 0 to 12095\n",
438
+ "Data columns (total 11 columns):\n",
439
+ " # Column Non-Null Count Dtype \n",
440
+ "--- ------ -------------- ----- \n",
441
+ " 0 Week number 12096 non-null object \n",
442
+ " 1 Channel 12096 non-null object \n",
443
+ " 2 Week Day 12096 non-null object \n",
444
+ " 3 TimeBand 12096 non-null object \n",
445
+ " 4 Share 12096 non-null float64\n",
446
+ " 5 AMA 12096 non-null float64\n",
447
+ " 6 rate 12096 non-null float64\n",
448
+ " 7 daily reach 12096 non-null float64\n",
449
+ " 8 cume reach 12096 non-null float64\n",
450
+ " 9 ATS 12096 non-null object \n",
451
+ " 10 Unrolled 12096 non-null float64\n",
452
+ "dtypes: float64(6), object(5)\n",
453
+ "memory usage: 1.0+ MB\n"
454
+ ]
455
+ }
456
+ ],
457
+ "source": [
458
+ "df.info()"
459
+ ]
460
+ },
461
+ {
462
+ "cell_type": "code",
463
+ "execution_count": 6,
464
+ "id": "31fd40e9",
465
+ "metadata": {},
466
+ "outputs": [
467
+ {
468
+ "data": {
469
+ "text/html": [
470
+ "<div>\n",
471
+ "<style scoped>\n",
472
+ " .dataframe tbody tr th:only-of-type {\n",
473
+ " vertical-align: middle;\n",
474
+ " }\n",
475
+ "\n",
476
+ " .dataframe tbody tr th {\n",
477
+ " vertical-align: top;\n",
478
+ " }\n",
479
+ "\n",
480
+ " .dataframe thead th {\n",
481
+ " text-align: right;\n",
482
+ " }\n",
483
+ "</style>\n",
484
+ "<table border=\"1\" class=\"dataframe\">\n",
485
+ " <thead>\n",
486
+ " <tr style=\"text-align: right;\">\n",
487
+ " <th></th>\n",
488
+ " <th>Share</th>\n",
489
+ " <th>AMA</th>\n",
490
+ " <th>rate</th>\n",
491
+ " <th>daily reach</th>\n",
492
+ " <th>cume reach</th>\n",
493
+ " <th>Unrolled</th>\n",
494
+ " </tr>\n",
495
+ " </thead>\n",
496
+ " <tbody>\n",
497
+ " <tr>\n",
498
+ " <th>count</th>\n",
499
+ " <td>12096.000000</td>\n",
500
+ " <td>12096.000000</td>\n",
501
+ " <td>12096.000000</td>\n",
502
+ " <td>12096.000000</td>\n",
503
+ " <td>12096.000000</td>\n",
504
+ " <td>12096.000000</td>\n",
505
+ " </tr>\n",
506
+ " <tr>\n",
507
+ " <th>mean</th>\n",
508
+ " <td>0.904877</td>\n",
509
+ " <td>3.638381</td>\n",
510
+ " <td>0.031671</td>\n",
511
+ " <td>30.726294</td>\n",
512
+ " <td>30.726317</td>\n",
513
+ " <td>3.487959</td>\n",
514
+ " </tr>\n",
515
+ " <tr>\n",
516
+ " <th>std</th>\n",
517
+ " <td>3.773260</td>\n",
518
+ " <td>4.987969</td>\n",
519
+ " <td>0.074512</td>\n",
520
+ " <td>33.505783</td>\n",
521
+ " <td>33.505793</td>\n",
522
+ " <td>5.746293</td>\n",
523
+ " </tr>\n",
524
+ " <tr>\n",
525
+ " <th>min</th>\n",
526
+ " <td>0.000000</td>\n",
527
+ " <td>0.000000</td>\n",
528
+ " <td>0.000000</td>\n",
529
+ " <td>0.000000</td>\n",
530
+ " <td>0.000000</td>\n",
531
+ " <td>0.000000</td>\n",
532
+ " </tr>\n",
533
+ " <tr>\n",
534
+ " <th>25%</th>\n",
535
+ " <td>0.089353</td>\n",
536
+ " <td>0.122776</td>\n",
537
+ " <td>0.003831</td>\n",
538
+ " <td>3.000000</td>\n",
539
+ " <td>3.002531</td>\n",
540
+ " <td>0.000000</td>\n",
541
+ " </tr>\n",
542
+ " <tr>\n",
543
+ " <th>50%</th>\n",
544
+ " <td>0.199747</td>\n",
545
+ " <td>2.192741</td>\n",
546
+ " <td>0.015068</td>\n",
547
+ " <td>22.730000</td>\n",
548
+ " <td>22.732177</td>\n",
549
+ " <td>0.974788</td>\n",
550
+ " </tr>\n",
551
+ " <tr>\n",
552
+ " <th>75%</th>\n",
553
+ " <td>0.482635</td>\n",
554
+ " <td>5.174398</td>\n",
555
+ " <td>0.029070</td>\n",
556
+ " <td>46.930000</td>\n",
557
+ " <td>46.932208</td>\n",
558
+ " <td>4.620285</td>\n",
559
+ " </tr>\n",
560
+ " <tr>\n",
561
+ " <th>max</th>\n",
562
+ " <td>100.000000</td>\n",
563
+ " <td>42.072407</td>\n",
564
+ " <td>1.356598</td>\n",
565
+ " <td>229.330000</td>\n",
566
+ " <td>229.334577</td>\n",
567
+ " <td>60.765814</td>\n",
568
+ " </tr>\n",
569
+ " </tbody>\n",
570
+ "</table>\n",
571
+ "</div>"
572
+ ],
573
+ "text/plain": [
574
+ " Share AMA rate daily reach cume reach \\\n",
575
+ "count 12096.000000 12096.000000 12096.000000 12096.000000 12096.000000 \n",
576
+ "mean 0.904877 3.638381 0.031671 30.726294 30.726317 \n",
577
+ "std 3.773260 4.987969 0.074512 33.505783 33.505793 \n",
578
+ "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
579
+ "25% 0.089353 0.122776 0.003831 3.000000 3.002531 \n",
580
+ "50% 0.199747 2.192741 0.015068 22.730000 22.732177 \n",
581
+ "75% 0.482635 5.174398 0.029070 46.930000 46.932208 \n",
582
+ "max 100.000000 42.072407 1.356598 229.330000 229.334577 \n",
583
+ "\n",
584
+ " Unrolled \n",
585
+ "count 12096.000000 \n",
586
+ "mean 3.487959 \n",
587
+ "std 5.746293 \n",
588
+ "min 0.000000 \n",
589
+ "25% 0.000000 \n",
590
+ "50% 0.974788 \n",
591
+ "75% 4.620285 \n",
592
+ "max 60.765814 "
593
+ ]
594
+ },
595
+ "execution_count": 6,
596
+ "metadata": {},
597
+ "output_type": "execute_result"
598
+ }
599
+ ],
600
+ "source": [
601
+ "df.describe()"
602
+ ]
603
+ },
604
+ {
605
+ "cell_type": "code",
606
+ "execution_count": 7,
607
+ "id": "741765e3",
608
+ "metadata": {},
609
+ "outputs": [
610
+ {
611
+ "data": {
612
+ "text/plain": [
613
+ "Week number\n",
614
+ "7'23 1344\n",
615
+ "8'23 1344\n",
616
+ "9'23 1344\n",
617
+ "10'23 1344\n",
618
+ "11'23 1344\n",
619
+ "12'23 1344\n",
620
+ "13'23 1344\n",
621
+ "14'23 1344\n",
622
+ "15'23 1344\n",
623
+ "Name: count, dtype: int64"
624
+ ]
625
+ },
626
+ "execution_count": 7,
627
+ "metadata": {},
628
+ "output_type": "execute_result"
629
+ }
630
+ ],
631
+ "source": [
632
+ "# Count values of Week number\n",
633
+ "df['Week number'].value_counts() # we have records of from 7 to 15"
634
+ ]
635
+ },
636
+ {
637
+ "cell_type": "code",
638
+ "execution_count": 8,
639
+ "id": "894d2430",
640
+ "metadata": {},
641
+ "outputs": [
642
+ {
643
+ "data": {
644
+ "text/plain": [
645
+ "Channel\n",
646
+ "Aaj Tak 12096\n",
647
+ "Name: count, dtype: int64"
648
+ ]
649
+ },
650
+ "execution_count": 8,
651
+ "metadata": {},
652
+ "output_type": "execute_result"
653
+ }
654
+ ],
655
+ "source": [
656
+ "# Count values of Channel\n",
657
+ "df['Channel'].value_counts()"
658
+ ]
659
+ },
660
+ {
661
+ "cell_type": "code",
662
+ "execution_count": 9,
663
+ "id": "abbc65aa",
664
+ "metadata": {},
665
+ "outputs": [
666
+ {
667
+ "data": {
668
+ "text/plain": [
669
+ "Week Day\n",
670
+ "Saturday 1728\n",
671
+ "Sunday 1728\n",
672
+ "Monday 1728\n",
673
+ "Tuesday 1728\n",
674
+ "Wednesday 1728\n",
675
+ "Thursday 1728\n",
676
+ "Friday 1728\n",
677
+ "Name: count, dtype: int64"
678
+ ]
679
+ },
680
+ "execution_count": 9,
681
+ "metadata": {},
682
+ "output_type": "execute_result"
683
+ }
684
+ ],
685
+ "source": [
686
+ "# Count values of Week Day\n",
687
+ "df['Week Day'].value_counts() # from Sunday to Monday"
688
+ ]
689
+ },
690
+ {
691
+ "cell_type": "code",
692
+ "execution_count": 10,
693
+ "id": "24a0ea3a",
694
+ "metadata": {},
695
+ "outputs": [
696
+ {
697
+ "data": {
698
+ "text/plain": [
699
+ "TimeBand\n",
700
+ "02:00:00 - 02:30:00 252\n",
701
+ "02:30:00 - 03:00:00 252\n",
702
+ "15:00:00 - 15:30:00 252\n",
703
+ "15:30:00 - 16:00:00 252\n",
704
+ "16:00:00 - 16:30:00 252\n",
705
+ "16:30:00 - 17:00:00 252\n",
706
+ "17:00:00 - 17:30:00 252\n",
707
+ "17:30:00 - 18:00:00 252\n",
708
+ "18:00:00 - 18:30:00 252\n",
709
+ "18:30:00 - 19:00:00 252\n",
710
+ "19:00:00 - 19:30:00 252\n",
711
+ "19:30:00 - 20:00:00 252\n",
712
+ "20:00:00 - 20:30:00 252\n",
713
+ "20:30:00 - 21:00:00 252\n",
714
+ "21:00:00 - 21:30:00 252\n",
715
+ "21:30:00 - 22:00:00 252\n",
716
+ "22:00:00 - 22:30:00 252\n",
717
+ "22:30:00 - 23:00:00 252\n",
718
+ "23:00:00 - 23:30:00 252\n",
719
+ "23:30:00 - 24:00:00 252\n",
720
+ "24:00:00 - 24:30:00 252\n",
721
+ "24:30:00 - 25:00:00 252\n",
722
+ "25:00:00 - 25:30:00 252\n",
723
+ "14:30:00 - 15:00:00 252\n",
724
+ "14:00:00 - 14:30:00 252\n",
725
+ "13:30:00 - 14:00:00 252\n",
726
+ "07:30:00 - 08:00:00 252\n",
727
+ "03:00:00 - 03:30:00 252\n",
728
+ "03:30:00 - 04:00:00 252\n",
729
+ "04:00:00 - 04:30:00 252\n",
730
+ "04:30:00 - 05:00:00 252\n",
731
+ "05:00:00 - 05:30:00 252\n",
732
+ "05:30:00 - 06:00:00 252\n",
733
+ "06:00:00 - 06:30:00 252\n",
734
+ "06:30:00 - 07:00:00 252\n",
735
+ "07:00:00 - 07:30:00 252\n",
736
+ "08:00:00 - 08:30:00 252\n",
737
+ "13:00:00 - 13:30:00 252\n",
738
+ "08:30:00 - 09:00:00 252\n",
739
+ "09:00:00 - 09:30:00 252\n",
740
+ "09:30:00 - 10:00:00 252\n",
741
+ "10:00:00 - 10:30:00 252\n",
742
+ "10:30:00 - 11:00:00 252\n",
743
+ "11:00:00 - 11:30:00 252\n",
744
+ "11:30:00 - 12:00:00 252\n",
745
+ "12:00:00 - 12:30:00 252\n",
746
+ "12:30:00 - 13:00:00 252\n",
747
+ "25:30:00 - 26:00:00 252\n",
748
+ "Name: count, dtype: int64"
749
+ ]
750
+ },
751
+ "execution_count": 10,
752
+ "metadata": {},
753
+ "output_type": "execute_result"
754
+ }
755
+ ],
756
+ "source": [
757
+ "# count values of TimeBand\n",
758
+ "df['TimeBand'].value_counts()"
759
+ ]
760
+ },
761
+ {
762
+ "cell_type": "markdown",
763
+ "id": "be8183bd",
764
+ "metadata": {},
765
+ "source": [
766
+ "## Label Encoding"
767
+ ]
768
+ },
769
+ {
770
+ "cell_type": "code",
771
+ "execution_count": 11,
772
+ "id": "877e32b9",
773
+ "metadata": {},
774
+ "outputs": [
775
+ {
776
+ "data": {
777
+ "text/plain": [
778
+ "Index(['Week number', 'Channel', 'Week Day', 'TimeBand', 'Share', 'AMA',\n",
779
+ " 'rate', 'daily reach', 'cume reach', 'ATS', 'Unrolled'],\n",
780
+ " dtype='object')"
781
+ ]
782
+ },
783
+ "execution_count": 11,
784
+ "metadata": {},
785
+ "output_type": "execute_result"
786
+ }
787
+ ],
788
+ "source": [
789
+ "df.columns"
790
+ ]
791
+ },
792
+ {
793
+ "cell_type": "code",
794
+ "execution_count": 12,
795
+ "id": "9f922296",
796
+ "metadata": {},
797
+ "outputs": [
798
+ {
799
+ "name": "stdout",
800
+ "output_type": "stream",
801
+ "text": [
802
+ "<class 'pandas.core.frame.DataFrame'>\n",
803
+ "RangeIndex: 12096 entries, 0 to 12095\n",
804
+ "Data columns (total 11 columns):\n",
805
+ " # Column Non-Null Count Dtype \n",
806
+ "--- ------ -------------- ----- \n",
807
+ " 0 Week number 12096 non-null object \n",
808
+ " 1 Channel 12096 non-null object \n",
809
+ " 2 Week Day 12096 non-null object \n",
810
+ " 3 TimeBand 12096 non-null object \n",
811
+ " 4 Share 12096 non-null float64\n",
812
+ " 5 AMA 12096 non-null float64\n",
813
+ " 6 rate 12096 non-null float64\n",
814
+ " 7 daily reach 12096 non-null float64\n",
815
+ " 8 cume reach 12096 non-null float64\n",
816
+ " 9 ATS 12096 non-null object \n",
817
+ " 10 Unrolled 12096 non-null float64\n",
818
+ "dtypes: float64(6), object(5)\n",
819
+ "memory usage: 1.0+ MB\n"
820
+ ]
821
+ }
822
+ ],
823
+ "source": [
824
+ "df.info()"
825
+ ]
826
+ },
827
+ {
828
+ "cell_type": "code",
829
+ "execution_count": 13,
830
+ "id": "109ffb8d",
831
+ "metadata": {},
832
+ "outputs": [],
833
+ "source": [
834
+ "# Need to Label Encode columns like: \n",
835
+ "# As of now Channel is not needed to encode as we are checking with AajTak only\n",
836
+ "# 1: Week Day\n",
837
+ "# 2: TimeBand"
838
+ ]
839
+ },
840
+ {
841
+ "cell_type": "code",
842
+ "execution_count": 14,
843
+ "id": "e4fd0b0b",
844
+ "metadata": {},
845
+ "outputs": [],
846
+ "source": [
847
+ "# 1: Week Day\n",
848
+ "\n",
849
+ "weekDay_le = LabelEncoder()\n",
850
+ "df['Week_Day_Encoded'] = weekDay_le.fit_transform(df['Week Day'])"
851
+ ]
852
+ },
853
+ {
854
+ "cell_type": "code",
855
+ "execution_count": 15,
856
+ "id": "9b10dc13",
857
+ "metadata": {},
858
+ "outputs": [],
859
+ "source": [
860
+ "# L1 = list(weekDay_le.inverse_transform(df['Week_Day_Encoded']))\n",
861
+ "# d1 = dict(zip(weekDay_le.classes_, weekDay_le.transform(weekDay_le.classes_)))\n",
862
+ "# print (d1)\n",
863
+ "\n",
864
+ "# # Output: {'Friday': 0, 'Monday': 1, 'Saturday': 2, 'Sunday': 3, 'Thursday': 4, 'Tuesday': 5, 'Wednesday': 6}"
865
+ ]
866
+ },
867
+ {
868
+ "cell_type": "code",
869
+ "execution_count": 16,
870
+ "id": "bc705800",
871
+ "metadata": {},
872
+ "outputs": [],
873
+ "source": [
874
+ "# 2: TimeBand\n",
875
+ "\n",
876
+ "timeBand_le = LabelEncoder()\n",
877
+ "df['Time_Band_Encoded'] = timeBand_le.fit_transform(df['TimeBand'])"
878
+ ]
879
+ },
880
+ {
881
+ "cell_type": "code",
882
+ "execution_count": 17,
883
+ "id": "16ac2be3",
884
+ "metadata": {},
885
+ "outputs": [],
886
+ "source": [
887
+ "# L2 = list(timeBand_le.inverse_transform(df['Time_Band_Encoded']))\n",
888
+ "# d2 = dict(zip(timeBand_le.classes_, timeBand_le.transform(timeBand_le.classes_)))\n",
889
+ "# print(d2)\n",
890
+ "\n",
891
+ "# # # Output: {'02:00:00 - 02:30:00': 0, '02:30:00 - 03:00:00': 1, '03:00:00 - 03:30:00': 2, '03:30:00 - 04:00:00': 3, \n",
892
+ "# '04:00:00 - 04:30:00': 4, '04:30:00 - 05:00:00': 5, '05:00:00 - 05:30:00': 6, '05:30:00 - 06:00:00': 7, \n",
893
+ "# '06:00:00 - 06:30:00': 8, '06:30:00 - 07:00:00': 9, '07:00:00 - 07:30:00': 10, '07:30:00 - 08:00:00': 11, \n",
894
+ "# '08:00:00 - 08:30:00': 12, '08:30:00 - 09:00:00': 13, '09:00:00 - 09:30:00': 14, '09:30:00 - 10:00:00': 15, \n",
895
+ "# '10:00:00 - 10:30:00': 16, '10:30:00 - 11:00:00': 17, '11:00:00 - 11:30:00': 18, '11:30:00 - 12:00:00': 19, \n",
896
+ "# '12:00:00 - 12:30:00': 20, '12:30:00 - 13:00:00': 21, '13:00:00 - 13:30:00': 22, '13:30:00 - 14:00:00': 23, \n",
897
+ "# '14:00:00 - 14:30:00': 24, '14:30:00 - 15:00:00': 25, '15:00:00 - 15:30:00': 26, '15:30:00 - 16:00:00': 27, \n",
898
+ "# '16:00:00 - 16:30:00': 28, '16:30:00 - 17:00:00': 29, '17:00:00 - 17:30:00': 30, '17:30:00 - 18:00:00': 31, \n",
899
+ "# '18:00:00 - 18:30:00': 32, '18:30:00 - 19:00:00': 33, '19:00:00 - 19:30:00': 34, '19:30:00 - 20:00:00': 35, \n",
900
+ "# '20:00:00 - 20:30:00': 36, '20:30:00 - 21:00:00': 37, '21:00:00 - 21:30:00': 38, '21:30:00 - 22:00:00': 39, \n",
901
+ "# '22:00:00 - 22:30:00': 40, '22:30:00 - 23:00:00': 41, '23:00:00 - 23:30:00': 42, '23:30:00 - 24:00:00': 43, \n",
902
+ "# '24:00:00 - 24:30:00': 44, '24:30:00 - 25:00:00': 45, '25:00:00 - 25:30:00': 46, '25:30:00 - 26:00:00': 47}"
903
+ ]
904
+ },
905
+ {
906
+ "cell_type": "code",
907
+ "execution_count": 18,
908
+ "id": "e65f3a9b",
909
+ "metadata": {},
910
+ "outputs": [
911
+ {
912
+ "data": {
913
+ "text/html": [
914
+ "<div>\n",
915
+ "<style scoped>\n",
916
+ " .dataframe tbody tr th:only-of-type {\n",
917
+ " vertical-align: middle;\n",
918
+ " }\n",
919
+ "\n",
920
+ " .dataframe tbody tr th {\n",
921
+ " vertical-align: top;\n",
922
+ " }\n",
923
+ "\n",
924
+ " .dataframe thead th {\n",
925
+ " text-align: right;\n",
926
+ " }\n",
927
+ "</style>\n",
928
+ "<table border=\"1\" class=\"dataframe\">\n",
929
+ " <thead>\n",
930
+ " <tr style=\"text-align: right;\">\n",
931
+ " <th></th>\n",
932
+ " <th>Week number</th>\n",
933
+ " <th>Channel</th>\n",
934
+ " <th>Week Day</th>\n",
935
+ " <th>TimeBand</th>\n",
936
+ " <th>Share</th>\n",
937
+ " <th>AMA</th>\n",
938
+ " <th>rate</th>\n",
939
+ " <th>daily reach</th>\n",
940
+ " <th>cume reach</th>\n",
941
+ " <th>ATS</th>\n",
942
+ " <th>Unrolled</th>\n",
943
+ " <th>Week_Day_Encoded</th>\n",
944
+ " <th>Time_Band_Encoded</th>\n",
945
+ " </tr>\n",
946
+ " </thead>\n",
947
+ " <tbody>\n",
948
+ " <tr>\n",
949
+ " <th>0</th>\n",
950
+ " <td>7'23</td>\n",
951
+ " <td>Aaj Tak</td>\n",
952
+ " <td>Saturday</td>\n",
953
+ " <td>02:00:00 - 02:30:00</td>\n",
954
+ " <td>0.081305</td>\n",
955
+ " <td>0.123363</td>\n",
956
+ " <td>0.000433</td>\n",
957
+ " <td>3.70</td>\n",
958
+ " <td>3.700893</td>\n",
959
+ " <td>00:01:00</td>\n",
960
+ " <td>0.0</td>\n",
961
+ " <td>2</td>\n",
962
+ " <td>0</td>\n",
963
+ " </tr>\n",
964
+ " <tr>\n",
965
+ " <th>1</th>\n",
966
+ " <td>7'23</td>\n",
967
+ " <td>Aaj Tak</td>\n",
968
+ " <td>Saturday</td>\n",
969
+ " <td>02:30:00 - 03:00:00</td>\n",
970
+ " <td>0.469995</td>\n",
971
+ " <td>0.394070</td>\n",
972
+ " <td>0.001383</td>\n",
973
+ " <td>11.82</td>\n",
974
+ " <td>11.822103</td>\n",
975
+ " <td>00:01:00</td>\n",
976
+ " <td>0.0</td>\n",
977
+ " <td>2</td>\n",
978
+ " <td>1</td>\n",
979
+ " </tr>\n",
980
+ " <tr>\n",
981
+ " <th>2</th>\n",
982
+ " <td>7'23</td>\n",
983
+ " <td>Aaj Tak</td>\n",
984
+ " <td>Saturday</td>\n",
985
+ " <td>03:00:00 - 03:30:00</td>\n",
986
+ " <td>1.723084</td>\n",
987
+ " <td>0.361537</td>\n",
988
+ " <td>0.001269</td>\n",
989
+ " <td>10.85</td>\n",
990
+ " <td>10.846120</td>\n",
991
+ " <td>00:01:00</td>\n",
992
+ " <td>0.0</td>\n",
993
+ " <td>2</td>\n",
994
+ " <td>2</td>\n",
995
+ " </tr>\n",
996
+ " <tr>\n",
997
+ " <th>3</th>\n",
998
+ " <td>7'23</td>\n",
999
+ " <td>Aaj Tak</td>\n",
1000
+ " <td>Saturday</td>\n",
1001
+ " <td>03:30:00 - 04:00:00</td>\n",
1002
+ " <td>2.019206</td>\n",
1003
+ " <td>0.251790</td>\n",
1004
+ " <td>0.000884</td>\n",
1005
+ " <td>7.55</td>\n",
1006
+ " <td>7.553692</td>\n",
1007
+ " <td>00:01:00</td>\n",
1008
+ " <td>0.0</td>\n",
1009
+ " <td>2</td>\n",
1010
+ " <td>3</td>\n",
1011
+ " </tr>\n",
1012
+ " <tr>\n",
1013
+ " <th>4</th>\n",
1014
+ " <td>7'23</td>\n",
1015
+ " <td>Aaj Tak</td>\n",
1016
+ " <td>Saturday</td>\n",
1017
+ " <td>04:00:00 - 04:30:00</td>\n",
1018
+ " <td>1.163916</td>\n",
1019
+ " <td>0.333603</td>\n",
1020
+ " <td>0.001171</td>\n",
1021
+ " <td>10.01</td>\n",
1022
+ " <td>10.008100</td>\n",
1023
+ " <td>00:01:00</td>\n",
1024
+ " <td>0.0</td>\n",
1025
+ " <td>2</td>\n",
1026
+ " <td>4</td>\n",
1027
+ " </tr>\n",
1028
+ " </tbody>\n",
1029
+ "</table>\n",
1030
+ "</div>"
1031
+ ],
1032
+ "text/plain": [
1033
+ " Week number Channel Week Day TimeBand Share AMA \\\n",
1034
+ "0 7'23 Aaj Tak Saturday 02:00:00 - 02:30:00 0.081305 0.123363 \n",
1035
+ "1 7'23 Aaj Tak Saturday 02:30:00 - 03:00:00 0.469995 0.394070 \n",
1036
+ "2 7'23 Aaj Tak Saturday 03:00:00 - 03:30:00 1.723084 0.361537 \n",
1037
+ "3 7'23 Aaj Tak Saturday 03:30:00 - 04:00:00 2.019206 0.251790 \n",
1038
+ "4 7'23 Aaj Tak Saturday 04:00:00 - 04:30:00 1.163916 0.333603 \n",
1039
+ "\n",
1040
+ " rate daily reach cume reach ATS Unrolled Week_Day_Encoded \\\n",
1041
+ "0 0.000433 3.70 3.700893 00:01:00 0.0 2 \n",
1042
+ "1 0.001383 11.82 11.822103 00:01:00 0.0 2 \n",
1043
+ "2 0.001269 10.85 10.846120 00:01:00 0.0 2 \n",
1044
+ "3 0.000884 7.55 7.553692 00:01:00 0.0 2 \n",
1045
+ "4 0.001171 10.01 10.008100 00:01:00 0.0 2 \n",
1046
+ "\n",
1047
+ " Time_Band_Encoded \n",
1048
+ "0 0 \n",
1049
+ "1 1 \n",
1050
+ "2 2 \n",
1051
+ "3 3 \n",
1052
+ "4 4 "
1053
+ ]
1054
+ },
1055
+ "execution_count": 18,
1056
+ "metadata": {},
1057
+ "output_type": "execute_result"
1058
+ }
1059
+ ],
1060
+ "source": [
1061
+ "df.head()"
1062
+ ]
1063
+ },
1064
+ {
1065
+ "cell_type": "code",
1066
+ "execution_count": 19,
1067
+ "id": "e604dbc6",
1068
+ "metadata": {},
1069
+ "outputs": [
1070
+ {
1071
+ "name": "stdout",
1072
+ "output_type": "stream",
1073
+ "text": [
1074
+ "<class 'pandas.core.frame.DataFrame'>\n",
1075
+ "RangeIndex: 12096 entries, 0 to 12095\n",
1076
+ "Data columns (total 13 columns):\n",
1077
+ " # Column Non-Null Count Dtype \n",
1078
+ "--- ------ -------------- ----- \n",
1079
+ " 0 Week number 12096 non-null object \n",
1080
+ " 1 Channel 12096 non-null object \n",
1081
+ " 2 Week Day 12096 non-null object \n",
1082
+ " 3 TimeBand 12096 non-null object \n",
1083
+ " 4 Share 12096 non-null float64\n",
1084
+ " 5 AMA 12096 non-null float64\n",
1085
+ " 6 rate 12096 non-null float64\n",
1086
+ " 7 daily reach 12096 non-null float64\n",
1087
+ " 8 cume reach 12096 non-null float64\n",
1088
+ " 9 ATS 12096 non-null object \n",
1089
+ " 10 Unrolled 12096 non-null float64\n",
1090
+ " 11 Week_Day_Encoded 12096 non-null int32 \n",
1091
+ " 12 Time_Band_Encoded 12096 non-null int32 \n",
1092
+ "dtypes: float64(6), int32(2), object(5)\n",
1093
+ "memory usage: 1.1+ MB\n"
1094
+ ]
1095
+ }
1096
+ ],
1097
+ "source": [
1098
+ "df.info()"
1099
+ ]
1100
+ },
1101
+ {
1102
+ "cell_type": "markdown",
1103
+ "id": "fcb0b705",
1104
+ "metadata": {},
1105
+ "source": [
1106
+ "## Model Development : RandomForestRegressor"
1107
+ ]
1108
+ },
1109
+ {
1110
+ "cell_type": "code",
1111
+ "execution_count": 20,
1112
+ "id": "f5af473f",
1113
+ "metadata": {},
1114
+ "outputs": [],
1115
+ "source": [
1116
+ "# Splitting into X and y \n",
1117
+ "\n",
1118
+ "X = df[['Share', 'AMA', 'rate','daily reach', 'cume reach','Week_Day_Encoded','Time_Band_Encoded']]\n",
1119
+ "y = df[['Unrolled']]"
1120
+ ]
1121
+ },
1122
+ {
1123
+ "cell_type": "code",
1124
+ "execution_count": 33,
1125
+ "id": "8b74a5b8",
1126
+ "metadata": {},
1127
+ "outputs": [],
1128
+ "source": [
1129
+ "# Splitting into training and testing datasets\n",
1130
+ "\n",
1131
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state = 42)"
1132
+ ]
1133
+ },
1134
+ {
1135
+ "cell_type": "code",
1136
+ "execution_count": 34,
1137
+ "id": "306b52f8",
1138
+ "metadata": {},
1139
+ "outputs": [
1140
+ {
1141
+ "data": {
1142
+ "text/plain": [
1143
+ "((9676, 7), (2420, 7), (9676, 1), (2420, 1))"
1144
+ ]
1145
+ },
1146
+ "execution_count": 34,
1147
+ "metadata": {},
1148
+ "output_type": "execute_result"
1149
+ }
1150
+ ],
1151
+ "source": [
1152
+ "X_train.shape, X_test.shape, y_train.shape, y_test.shape"
1153
+ ]
1154
+ },
1155
+ {
1156
+ "cell_type": "code",
1157
+ "execution_count": 35,
1158
+ "id": "0d6b3c6e",
1159
+ "metadata": {},
1160
+ "outputs": [
1161
+ {
1162
+ "data": {
1163
+ "text/html": [
1164
+ "<div>\n",
1165
+ "<style scoped>\n",
1166
+ " .dataframe tbody tr th:only-of-type {\n",
1167
+ " vertical-align: middle;\n",
1168
+ " }\n",
1169
+ "\n",
1170
+ " .dataframe tbody tr th {\n",
1171
+ " vertical-align: top;\n",
1172
+ " }\n",
1173
+ "\n",
1174
+ " .dataframe thead th {\n",
1175
+ " text-align: right;\n",
1176
+ " }\n",
1177
+ "</style>\n",
1178
+ "<table border=\"1\" class=\"dataframe\">\n",
1179
+ " <thead>\n",
1180
+ " <tr style=\"text-align: right;\">\n",
1181
+ " <th></th>\n",
1182
+ " <th>Share</th>\n",
1183
+ " <th>AMA</th>\n",
1184
+ " <th>rate</th>\n",
1185
+ " <th>daily reach</th>\n",
1186
+ " <th>cume reach</th>\n",
1187
+ " <th>Week_Day_Encoded</th>\n",
1188
+ " <th>Time_Band_Encoded</th>\n",
1189
+ " </tr>\n",
1190
+ " </thead>\n",
1191
+ " <tbody>\n",
1192
+ " <tr>\n",
1193
+ " <th>11232</th>\n",
1194
+ " <td>0.043364</td>\n",
1195
+ " <td>0.080953</td>\n",
1196
+ " <td>0.000357</td>\n",
1197
+ " <td>2.43</td>\n",
1198
+ " <td>2.428586</td>\n",
1199
+ " <td>5</td>\n",
1200
+ " <td>0</td>\n",
1201
+ " </tr>\n",
1202
+ " <tr>\n",
1203
+ " <th>11118</th>\n",
1204
+ " <td>0.319280</td>\n",
1205
+ " <td>7.050287</td>\n",
1206
+ " <td>0.031111</td>\n",
1207
+ " <td>45.37</td>\n",
1208
+ " <td>45.372124</td>\n",
1209
+ " <td>2</td>\n",
1210
+ " <td>30</td>\n",
1211
+ " </tr>\n",
1212
+ " <tr>\n",
1213
+ " <th>9301</th>\n",
1214
+ " <td>0.090855</td>\n",
1215
+ " <td>5.284389</td>\n",
1216
+ " <td>0.023781</td>\n",
1217
+ " <td>60.32</td>\n",
1218
+ " <td>60.317940</td>\n",
1219
+ " <td>6</td>\n",
1220
+ " <td>37</td>\n",
1221
+ " </tr>\n",
1222
+ " <tr>\n",
1223
+ " <th>3222</th>\n",
1224
+ " <td>0.402614</td>\n",
1225
+ " <td>0.207835</td>\n",
1226
+ " <td>0.000917</td>\n",
1227
+ " <td>4.82</td>\n",
1228
+ " <td>4.815343</td>\n",
1229
+ " <td>6</td>\n",
1230
+ " <td>6</td>\n",
1231
+ " </tr>\n",
1232
+ " <tr>\n",
1233
+ " <th>10322</th>\n",
1234
+ " <td>12.873856</td>\n",
1235
+ " <td>0.064336</td>\n",
1236
+ " <td>0.015220</td>\n",
1237
+ " <td>1.93</td>\n",
1238
+ " <td>1.930081</td>\n",
1239
+ " <td>4</td>\n",
1240
+ " <td>2</td>\n",
1241
+ " </tr>\n",
1242
+ " </tbody>\n",
1243
+ "</table>\n",
1244
+ "</div>"
1245
+ ],
1246
+ "text/plain": [
1247
+ " Share AMA rate daily reach cume reach \\\n",
1248
+ "11232 0.043364 0.080953 0.000357 2.43 2.428586 \n",
1249
+ "11118 0.319280 7.050287 0.031111 45.37 45.372124 \n",
1250
+ "9301 0.090855 5.284389 0.023781 60.32 60.317940 \n",
1251
+ "3222 0.402614 0.207835 0.000917 4.82 4.815343 \n",
1252
+ "10322 12.873856 0.064336 0.015220 1.93 1.930081 \n",
1253
+ "\n",
1254
+ " Week_Day_Encoded Time_Band_Encoded \n",
1255
+ "11232 5 0 \n",
1256
+ "11118 2 30 \n",
1257
+ "9301 6 37 \n",
1258
+ "3222 6 6 \n",
1259
+ "10322 4 2 "
1260
+ ]
1261
+ },
1262
+ "execution_count": 35,
1263
+ "metadata": {},
1264
+ "output_type": "execute_result"
1265
+ }
1266
+ ],
1267
+ "source": [
1268
+ "X_train.head()"
1269
+ ]
1270
+ },
1271
+ {
1272
+ "cell_type": "code",
1273
+ "execution_count": 36,
1274
+ "id": "38e2d59b",
1275
+ "metadata": {},
1276
+ "outputs": [
1277
+ {
1278
+ "data": {
1279
+ "text/html": [
1280
+ "<div>\n",
1281
+ "<style scoped>\n",
1282
+ " .dataframe tbody tr th:only-of-type {\n",
1283
+ " vertical-align: middle;\n",
1284
+ " }\n",
1285
+ "\n",
1286
+ " .dataframe tbody tr th {\n",
1287
+ " vertical-align: top;\n",
1288
+ " }\n",
1289
+ "\n",
1290
+ " .dataframe thead th {\n",
1291
+ " text-align: right;\n",
1292
+ " }\n",
1293
+ "</style>\n",
1294
+ "<table border=\"1\" class=\"dataframe\">\n",
1295
+ " <thead>\n",
1296
+ " <tr style=\"text-align: right;\">\n",
1297
+ " <th></th>\n",
1298
+ " <th>Unrolled</th>\n",
1299
+ " </tr>\n",
1300
+ " </thead>\n",
1301
+ " <tbody>\n",
1302
+ " <tr>\n",
1303
+ " <th>11232</th>\n",
1304
+ " <td>0.000000</td>\n",
1305
+ " </tr>\n",
1306
+ " <tr>\n",
1307
+ " <th>11118</th>\n",
1308
+ " <td>0.000000</td>\n",
1309
+ " </tr>\n",
1310
+ " <tr>\n",
1311
+ " <th>9301</th>\n",
1312
+ " <td>6.285889</td>\n",
1313
+ " </tr>\n",
1314
+ " <tr>\n",
1315
+ " <th>3222</th>\n",
1316
+ " <td>0.473240</td>\n",
1317
+ " </tr>\n",
1318
+ " <tr>\n",
1319
+ " <th>10322</th>\n",
1320
+ " <td>0.000000</td>\n",
1321
+ " </tr>\n",
1322
+ " </tbody>\n",
1323
+ "</table>\n",
1324
+ "</div>"
1325
+ ],
1326
+ "text/plain": [
1327
+ " Unrolled\n",
1328
+ "11232 0.000000\n",
1329
+ "11118 0.000000\n",
1330
+ "9301 6.285889\n",
1331
+ "3222 0.473240\n",
1332
+ "10322 0.000000"
1333
+ ]
1334
+ },
1335
+ "execution_count": 36,
1336
+ "metadata": {},
1337
+ "output_type": "execute_result"
1338
+ }
1339
+ ],
1340
+ "source": [
1341
+ "y_train[:5]"
1342
+ ]
1343
+ },
1344
+ {
1345
+ "cell_type": "code",
1346
+ "execution_count": 37,
1347
+ "id": "0fc7342f",
1348
+ "metadata": {},
1349
+ "outputs": [
1350
+ {
1351
+ "data": {
1352
+ "text/html": [
1353
+ "<div>\n",
1354
+ "<style scoped>\n",
1355
+ " .dataframe tbody tr th:only-of-type {\n",
1356
+ " vertical-align: middle;\n",
1357
+ " }\n",
1358
+ "\n",
1359
+ " .dataframe tbody tr th {\n",
1360
+ " vertical-align: top;\n",
1361
+ " }\n",
1362
+ "\n",
1363
+ " .dataframe thead th {\n",
1364
+ " text-align: right;\n",
1365
+ " }\n",
1366
+ "</style>\n",
1367
+ "<table border=\"1\" class=\"dataframe\">\n",
1368
+ " <thead>\n",
1369
+ " <tr style=\"text-align: right;\">\n",
1370
+ " <th></th>\n",
1371
+ " <th>Share</th>\n",
1372
+ " <th>AMA</th>\n",
1373
+ " <th>rate</th>\n",
1374
+ " <th>daily reach</th>\n",
1375
+ " <th>cume reach</th>\n",
1376
+ " <th>Week_Day_Encoded</th>\n",
1377
+ " <th>Time_Band_Encoded</th>\n",
1378
+ " </tr>\n",
1379
+ " </thead>\n",
1380
+ " <tbody>\n",
1381
+ " <tr>\n",
1382
+ " <th>468</th>\n",
1383
+ " <td>0.152596</td>\n",
1384
+ " <td>9.820626</td>\n",
1385
+ " <td>0.043337</td>\n",
1386
+ " <td>94.61</td>\n",
1387
+ " <td>94.614234</td>\n",
1388
+ " <td>1</td>\n",
1389
+ " <td>36</td>\n",
1390
+ " </tr>\n",
1391
+ " <tr>\n",
1392
+ " <th>11620</th>\n",
1393
+ " <td>0.000000</td>\n",
1394
+ " <td>0.000000</td>\n",
1395
+ " <td>0.000000</td>\n",
1396
+ " <td>0.00</td>\n",
1397
+ " <td>0.000000</td>\n",
1398
+ " <td>6</td>\n",
1399
+ " <td>4</td>\n",
1400
+ " </tr>\n",
1401
+ " <tr>\n",
1402
+ " <th>538</th>\n",
1403
+ " <td>0.969294</td>\n",
1404
+ " <td>3.181874</td>\n",
1405
+ " <td>0.014043</td>\n",
1406
+ " <td>34.30</td>\n",
1407
+ " <td>34.298911</td>\n",
1408
+ " <td>6</td>\n",
1409
+ " <td>10</td>\n",
1410
+ " </tr>\n",
1411
+ " <tr>\n",
1412
+ " <th>5265</th>\n",
1413
+ " <td>0.064741</td>\n",
1414
+ " <td>2.991051</td>\n",
1415
+ " <td>0.013427</td>\n",
1416
+ " <td>41.62</td>\n",
1417
+ " <td>41.619074</td>\n",
1418
+ " <td>6</td>\n",
1419
+ " <td>33</td>\n",
1420
+ " </tr>\n",
1421
+ " <tr>\n",
1422
+ " <th>7484</th>\n",
1423
+ " <td>0.000000</td>\n",
1424
+ " <td>0.000000</td>\n",
1425
+ " <td>0.000000</td>\n",
1426
+ " <td>0.00</td>\n",
1427
+ " <td>0.000000</td>\n",
1428
+ " <td>3</td>\n",
1429
+ " <td>44</td>\n",
1430
+ " </tr>\n",
1431
+ " </tbody>\n",
1432
+ "</table>\n",
1433
+ "</div>"
1434
+ ],
1435
+ "text/plain": [
1436
+ " Share AMA rate daily reach cume reach \\\n",
1437
+ "468 0.152596 9.820626 0.043337 94.61 94.614234 \n",
1438
+ "11620 0.000000 0.000000 0.000000 0.00 0.000000 \n",
1439
+ "538 0.969294 3.181874 0.014043 34.30 34.298911 \n",
1440
+ "5265 0.064741 2.991051 0.013427 41.62 41.619074 \n",
1441
+ "7484 0.000000 0.000000 0.000000 0.00 0.000000 \n",
1442
+ "\n",
1443
+ " Week_Day_Encoded Time_Band_Encoded \n",
1444
+ "468 1 36 \n",
1445
+ "11620 6 4 \n",
1446
+ "538 6 10 \n",
1447
+ "5265 6 33 \n",
1448
+ "7484 3 44 "
1449
+ ]
1450
+ },
1451
+ "execution_count": 37,
1452
+ "metadata": {},
1453
+ "output_type": "execute_result"
1454
+ }
1455
+ ],
1456
+ "source": [
1457
+ "X_test.head()"
1458
+ ]
1459
+ },
1460
+ {
1461
+ "cell_type": "code",
1462
+ "execution_count": 38,
1463
+ "id": "af5394e9",
1464
+ "metadata": {},
1465
+ "outputs": [
1466
+ {
1467
+ "data": {
1468
+ "text/html": [
1469
+ "<div>\n",
1470
+ "<style scoped>\n",
1471
+ " .dataframe tbody tr th:only-of-type {\n",
1472
+ " vertical-align: middle;\n",
1473
+ " }\n",
1474
+ "\n",
1475
+ " .dataframe tbody tr th {\n",
1476
+ " vertical-align: top;\n",
1477
+ " }\n",
1478
+ "\n",
1479
+ " .dataframe thead th {\n",
1480
+ " text-align: right;\n",
1481
+ " }\n",
1482
+ "</style>\n",
1483
+ "<table border=\"1\" class=\"dataframe\">\n",
1484
+ " <thead>\n",
1485
+ " <tr style=\"text-align: right;\">\n",
1486
+ " <th></th>\n",
1487
+ " <th>Unrolled</th>\n",
1488
+ " </tr>\n",
1489
+ " </thead>\n",
1490
+ " <tbody>\n",
1491
+ " <tr>\n",
1492
+ " <th>468</th>\n",
1493
+ " <td>12.150886</td>\n",
1494
+ " </tr>\n",
1495
+ " <tr>\n",
1496
+ " <th>11620</th>\n",
1497
+ " <td>0.000000</td>\n",
1498
+ " </tr>\n",
1499
+ " <tr>\n",
1500
+ " <th>538</th>\n",
1501
+ " <td>1.480424</td>\n",
1502
+ " </tr>\n",
1503
+ " <tr>\n",
1504
+ " <th>5265</th>\n",
1505
+ " <td>5.781056</td>\n",
1506
+ " </tr>\n",
1507
+ " <tr>\n",
1508
+ " <th>7484</th>\n",
1509
+ " <td>0.000000</td>\n",
1510
+ " </tr>\n",
1511
+ " </tbody>\n",
1512
+ "</table>\n",
1513
+ "</div>"
1514
+ ],
1515
+ "text/plain": [
1516
+ " Unrolled\n",
1517
+ "468 12.150886\n",
1518
+ "11620 0.000000\n",
1519
+ "538 1.480424\n",
1520
+ "5265 5.781056\n",
1521
+ "7484 0.000000"
1522
+ ]
1523
+ },
1524
+ "execution_count": 38,
1525
+ "metadata": {},
1526
+ "output_type": "execute_result"
1527
+ }
1528
+ ],
1529
+ "source": [
1530
+ "y_test[:5]"
1531
+ ]
1532
+ },
1533
+ {
1534
+ "cell_type": "code",
1535
+ "execution_count": 39,
1536
+ "id": "ad1452db",
1537
+ "metadata": {},
1538
+ "outputs": [
1539
+ {
1540
+ "data": {
1541
+ "text/html": [
1542
+ "<style>#sk-container-id-3 {color: black;background-color: white;}#sk-container-id-3 pre{padding: 0;}#sk-container-id-3 div.sk-toggleable {background-color: white;}#sk-container-id-3 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-3 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-3 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-3 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-3 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-3 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-3 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-3 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-3 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-3 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-3 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-3 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-3 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-3 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-3 div.sk-item {position: relative;z-index: 1;}#sk-container-id-3 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-3 div.sk-item::before, #sk-container-id-3 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-3 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-3 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-3 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-3 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-3 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-3 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-3 div.sk-label-container {text-align: center;}#sk-container-id-3 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-3 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-3\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>RandomForestRegressor(random_state=42)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-3\" type=\"checkbox\" checked><label for=\"sk-estimator-id-3\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">RandomForestRegressor</label><div class=\"sk-toggleable__content\"><pre>RandomForestRegressor(random_state=42)</pre></div></div></div></div></div>"
1543
+ ],
1544
+ "text/plain": [
1545
+ "RandomForestRegressor(random_state=42)"
1546
+ ]
1547
+ },
1548
+ "execution_count": 39,
1549
+ "metadata": {},
1550
+ "output_type": "execute_result"
1551
+ }
1552
+ ],
1553
+ "source": [
1554
+ "# Train Random Forest Regression model\n",
1555
+ "\n",
1556
+ "model = RandomForestRegressor(random_state = 42)\n",
1557
+ "model.fit(X_train, y_train)"
1558
+ ]
1559
+ },
1560
+ {
1561
+ "cell_type": "code",
1562
+ "execution_count": 40,
1563
+ "id": "58a025a8",
1564
+ "metadata": {},
1565
+ "outputs": [],
1566
+ "source": [
1567
+ "# Make predictions on train data\n",
1568
+ "\n",
1569
+ "y_pred_train = model.predict(X_train)"
1570
+ ]
1571
+ },
1572
+ {
1573
+ "cell_type": "code",
1574
+ "execution_count": 72,
1575
+ "id": "403259f6",
1576
+ "metadata": {},
1577
+ "outputs": [
1578
+ {
1579
+ "name": "stdout",
1580
+ "output_type": "stream",
1581
+ "text": [
1582
+ "The Accuracy of Training Dataset is : 95.65798927048185\n"
1583
+ ]
1584
+ }
1585
+ ],
1586
+ "source": [
1587
+ "acc_train = r2_score(y_train, y_pred_train)\n",
1588
+ "print(\"The Accuracy of Training Dataset is : \",acc_train*100)"
1589
+ ]
1590
+ },
1591
+ {
1592
+ "cell_type": "code",
1593
+ "execution_count": 42,
1594
+ "id": "ac553b1e",
1595
+ "metadata": {},
1596
+ "outputs": [],
1597
+ "source": [
1598
+ "# Make predictions on test data\n",
1599
+ "\n",
1600
+ "y_pred_test = model.predict(X_test)"
1601
+ ]
1602
+ },
1603
+ {
1604
+ "cell_type": "code",
1605
+ "execution_count": 71,
1606
+ "id": "bc359944",
1607
+ "metadata": {},
1608
+ "outputs": [
1609
+ {
1610
+ "name": "stdout",
1611
+ "output_type": "stream",
1612
+ "text": [
1613
+ "The Accuracy of Test Dataset is : 71.01332045918515\n"
1614
+ ]
1615
+ }
1616
+ ],
1617
+ "source": [
1618
+ "acc_test = r2_score(y_test, y_pred_test)\n",
1619
+ "print(\"The Accuracy of Test Dataset is : \",acc_test*100)"
1620
+ ]
1621
+ },
1622
+ {
1623
+ "cell_type": "code",
1624
+ "execution_count": 70,
1625
+ "id": "fa33faec",
1626
+ "metadata": {},
1627
+ "outputs": [],
1628
+ "source": [
1629
+ "# # Saving Model\n",
1630
+ "\n",
1631
+ "# import pickle\n",
1632
+ "\n",
1633
+ "# with open('aajTak_model.pkl','wb') as file1:\n",
1634
+ "# pickle.dump(model,file1) "
1635
+ ]
1636
+ },
1637
+ {
1638
+ "cell_type": "markdown",
1639
+ "id": "6f30a678",
1640
+ "metadata": {},
1641
+ "source": [
1642
+ "## Hyperparameter Tuning for Random Forest Regression"
1643
+ ]
1644
+ },
1645
+ {
1646
+ "cell_type": "code",
1647
+ "execution_count": 45,
1648
+ "id": "44bd53a2",
1649
+ "metadata": {},
1650
+ "outputs": [],
1651
+ "source": [
1652
+ "# Hyperparameter Tuning\n",
1653
+ "\n",
1654
+ "hyp_model = RandomForestRegressor()\n",
1655
+ "\n",
1656
+ "hyp = {\n",
1657
+ "\"n_estimators\": np.arange(10,50,10),\n",
1658
+ "'criterion':[\"squared_error\", \"absolute_error\"],\n",
1659
+ "'max_depth':np.arange(3,50),\n",
1660
+ "# 'min_samples_split':np.arange(2,5),\n",
1661
+ "# 'min_samples_leaf':np.arange(1,5),\n",
1662
+ "'random_state':np.arange(0,100)\n",
1663
+ "}"
1664
+ ]
1665
+ },
1666
+ {
1667
+ "cell_type": "code",
1668
+ "execution_count": 46,
1669
+ "id": "b7c9e0ab",
1670
+ "metadata": {},
1671
+ "outputs": [
1672
+ {
1673
+ "data": {
1674
+ "text/html": [
1675
+ "<style>#sk-container-id-4 {color: black;background-color: white;}#sk-container-id-4 pre{padding: 0;}#sk-container-id-4 div.sk-toggleable {background-color: white;}#sk-container-id-4 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-4 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-4 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-4 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-4 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-4 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-4 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-4 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-4 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-4 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-4 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-4 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-4 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-4 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-4 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-4 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-4 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-4 div.sk-item {position: relative;z-index: 1;}#sk-container-id-4 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-4 div.sk-item::before, #sk-container-id-4 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-4 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-4 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-4 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-4 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-4 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-4 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-4 div.sk-label-container {text-align: center;}#sk-container-id-4 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-4 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-4\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>RandomizedSearchCV(cv=5, estimator=RandomForestRegressor(),\n",
1676
+ " param_distributions={&#x27;criterion&#x27;: [&#x27;squared_error&#x27;,\n",
1677
+ " &#x27;absolute_error&#x27;],\n",
1678
+ " &#x27;max_depth&#x27;: array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,\n",
1679
+ " 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,\n",
1680
+ " 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]),\n",
1681
+ " &#x27;n_estimators&#x27;: array([10, 20, 30, 40]),\n",
1682
+ " &#x27;random_state&#x27;: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,\n",
1683
+ " 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,\n",
1684
+ " 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,\n",
1685
+ " 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,\n",
1686
+ " 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,\n",
1687
+ " 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])})</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-4\" type=\"checkbox\" ><label for=\"sk-estimator-id-4\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">RandomizedSearchCV</label><div class=\"sk-toggleable__content\"><pre>RandomizedSearchCV(cv=5, estimator=RandomForestRegressor(),\n",
1688
+ " param_distributions={&#x27;criterion&#x27;: [&#x27;squared_error&#x27;,\n",
1689
+ " &#x27;absolute_error&#x27;],\n",
1690
+ " &#x27;max_depth&#x27;: array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,\n",
1691
+ " 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,\n",
1692
+ " 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]),\n",
1693
+ " &#x27;n_estimators&#x27;: array([10, 20, 30, 40]),\n",
1694
+ " &#x27;random_state&#x27;: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,\n",
1695
+ " 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,\n",
1696
+ " 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,\n",
1697
+ " 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,\n",
1698
+ " 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,\n",
1699
+ " 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])})</pre></div></div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-5\" type=\"checkbox\" ><label for=\"sk-estimator-id-5\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">estimator: RandomForestRegressor</label><div class=\"sk-toggleable__content\"><pre>RandomForestRegressor()</pre></div></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-6\" type=\"checkbox\" ><label for=\"sk-estimator-id-6\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">RandomForestRegressor</label><div class=\"sk-toggleable__content\"><pre>RandomForestRegressor()</pre></div></div></div></div></div></div></div></div></div></div>"
1700
+ ],
1701
+ "text/plain": [
1702
+ "RandomizedSearchCV(cv=5, estimator=RandomForestRegressor(),\n",
1703
+ " param_distributions={'criterion': ['squared_error',\n",
1704
+ " 'absolute_error'],\n",
1705
+ " 'max_depth': array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,\n",
1706
+ " 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,\n",
1707
+ " 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]),\n",
1708
+ " 'n_estimators': array([10, 20, 30, 40]),\n",
1709
+ " 'random_state': array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,\n",
1710
+ " 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,\n",
1711
+ " 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,\n",
1712
+ " 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,\n",
1713
+ " 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,\n",
1714
+ " 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])})"
1715
+ ]
1716
+ },
1717
+ "execution_count": 46,
1718
+ "metadata": {},
1719
+ "output_type": "execute_result"
1720
+ }
1721
+ ],
1722
+ "source": [
1723
+ "rscv = RandomizedSearchCV(hyp_model, hyp, cv=5)\n",
1724
+ "rscv.fit(X_train,y_train)"
1725
+ ]
1726
+ },
1727
+ {
1728
+ "cell_type": "code",
1729
+ "execution_count": 47,
1730
+ "id": "f0b0d172",
1731
+ "metadata": {},
1732
+ "outputs": [
1733
+ {
1734
+ "data": {
1735
+ "text/plain": [
1736
+ "{'random_state': 49,\n",
1737
+ " 'n_estimators': 20,\n",
1738
+ " 'max_depth': 39,\n",
1739
+ " 'criterion': 'absolute_error'}"
1740
+ ]
1741
+ },
1742
+ "execution_count": 47,
1743
+ "metadata": {},
1744
+ "output_type": "execute_result"
1745
+ }
1746
+ ],
1747
+ "source": [
1748
+ "rscv.best_params_"
1749
+ ]
1750
+ },
1751
+ {
1752
+ "cell_type": "code",
1753
+ "execution_count": 48,
1754
+ "id": "0252bdea",
1755
+ "metadata": {},
1756
+ "outputs": [],
1757
+ "source": [
1758
+ "best_model = rscv.best_estimator_"
1759
+ ]
1760
+ },
1761
+ {
1762
+ "cell_type": "code",
1763
+ "execution_count": 49,
1764
+ "id": "b23a1e56",
1765
+ "metadata": {},
1766
+ "outputs": [
1767
+ {
1768
+ "data": {
1769
+ "text/html": [
1770
+ "<style>#sk-container-id-5 {color: black;background-color: white;}#sk-container-id-5 pre{padding: 0;}#sk-container-id-5 div.sk-toggleable {background-color: white;}#sk-container-id-5 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-5 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-5 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-5 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-5 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-5 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-5 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-5 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-5 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-5 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-5 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-5 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-5 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-5 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-5 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-5 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-5 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-5 div.sk-item {position: relative;z-index: 1;}#sk-container-id-5 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-5 div.sk-item::before, #sk-container-id-5 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-5 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-5 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-5 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-5 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-5 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-5 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-5 div.sk-label-container {text-align: center;}#sk-container-id-5 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-5 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-5\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>RandomForestRegressor(criterion=&#x27;absolute_error&#x27;, max_depth=39, n_estimators=20,\n",
1771
+ " random_state=49)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-7\" type=\"checkbox\" checked><label for=\"sk-estimator-id-7\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">RandomForestRegressor</label><div class=\"sk-toggleable__content\"><pre>RandomForestRegressor(criterion=&#x27;absolute_error&#x27;, max_depth=39, n_estimators=20,\n",
1772
+ " random_state=49)</pre></div></div></div></div></div>"
1773
+ ],
1774
+ "text/plain": [
1775
+ "RandomForestRegressor(criterion='absolute_error', max_depth=39, n_estimators=20,\n",
1776
+ " random_state=49)"
1777
+ ]
1778
+ },
1779
+ "execution_count": 49,
1780
+ "metadata": {},
1781
+ "output_type": "execute_result"
1782
+ }
1783
+ ],
1784
+ "source": [
1785
+ "best_model.fit(X_train, y_train)"
1786
+ ]
1787
+ },
1788
+ {
1789
+ "cell_type": "code",
1790
+ "execution_count": 50,
1791
+ "id": "c2d2e731",
1792
+ "metadata": {},
1793
+ "outputs": [],
1794
+ "source": [
1795
+ "ypredtn = best_model.predict(X_train)"
1796
+ ]
1797
+ },
1798
+ {
1799
+ "cell_type": "code",
1800
+ "execution_count": 51,
1801
+ "id": "9308b1d8",
1802
+ "metadata": {},
1803
+ "outputs": [
1804
+ {
1805
+ "name": "stdout",
1806
+ "output_type": "stream",
1807
+ "text": [
1808
+ "The Accuracy of Training Dataset after hyperparameter tuning is : 94.41670975802535\n"
1809
+ ]
1810
+ }
1811
+ ],
1812
+ "source": [
1813
+ "acctn = r2_score(y_train, ypredtn)\n",
1814
+ "print(\"The Accuracy of Training Dataset after hyperparameter tuning is : \",acctn*100)"
1815
+ ]
1816
+ },
1817
+ {
1818
+ "cell_type": "code",
1819
+ "execution_count": 52,
1820
+ "id": "23cf5580",
1821
+ "metadata": {},
1822
+ "outputs": [],
1823
+ "source": [
1824
+ "ypredts = best_model.predict(X_test)"
1825
+ ]
1826
+ },
1827
+ {
1828
+ "cell_type": "code",
1829
+ "execution_count": 54,
1830
+ "id": "d88fdedb",
1831
+ "metadata": {},
1832
+ "outputs": [
1833
+ {
1834
+ "name": "stdout",
1835
+ "output_type": "stream",
1836
+ "text": [
1837
+ "The Accuracy of Testing Dataset after hyperparameter tuning is : 69.97941529616791\n"
1838
+ ]
1839
+ }
1840
+ ],
1841
+ "source": [
1842
+ "accts = r2_score(y_test, ypredts)\n",
1843
+ "print(\"The Accuracy of Testing Dataset after hyperparameter tuning is : \",accts*100)"
1844
+ ]
1845
+ },
1846
+ {
1847
+ "cell_type": "code",
1848
+ "execution_count": 73,
1849
+ "id": "e5298c37",
1850
+ "metadata": {},
1851
+ "outputs": [],
1852
+ "source": [
1853
+ "# # Saving Model\n",
1854
+ "\n",
1855
+ "# import pickle\n",
1856
+ "\n",
1857
+ "# with open('aajTak_fineTune_model.pkl','wb') as file:\n",
1858
+ "# pickle.dump(best_model,file) "
1859
+ ]
1860
+ },
1861
+ {
1862
+ "cell_type": "code",
1863
+ "execution_count": 74,
1864
+ "id": "7a5d25ac",
1865
+ "metadata": {},
1866
+ "outputs": [],
1867
+ "source": [
1868
+ "# # Saving the LabelEncoders for weekDay\n",
1869
+ "\n",
1870
+ "# with open('weekDay_le.pkl','wb') as f1:\n",
1871
+ "# pickle.dump(weekDay_le,f1)"
1872
+ ]
1873
+ },
1874
+ {
1875
+ "cell_type": "code",
1876
+ "execution_count": 75,
1877
+ "id": "6a268e27",
1878
+ "metadata": {},
1879
+ "outputs": [],
1880
+ "source": [
1881
+ "# # Saving the LabelEncoders for timeBand\n",
1882
+ "\n",
1883
+ "# with open('timeBand_le.pkl','wb') as f2:\n",
1884
+ "# pickle.dump(timeBand_le,f2)"
1885
+ ]
1886
+ },
1887
+ {
1888
+ "cell_type": "markdown",
1889
+ "id": "57557ac1",
1890
+ "metadata": {},
1891
+ "source": [
1892
+ "## UserTest Function - Prediction Script"
1893
+ ]
1894
+ },
1895
+ {
1896
+ "cell_type": "code",
1897
+ "execution_count": 1,
1898
+ "id": "8cf621c3",
1899
+ "metadata": {},
1900
+ "outputs": [],
1901
+ "source": [
1902
+ "# import required packages\n",
1903
+ "\n",
1904
+ "import pandas as pd\n",
1905
+ "import numpy as np\n",
1906
+ "import matplotlib as plt\n",
1907
+ "import seaborn as sns\n",
1908
+ "\n",
1909
+ "from sklearn.model_selection import RandomizedSearchCV, GridSearchCV, train_test_split\n",
1910
+ "from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor\n",
1911
+ "from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error\n",
1912
+ "from sklearn.preprocessing import LabelEncoder\n",
1913
+ "\n",
1914
+ "import warnings\n",
1915
+ "warnings.filterwarnings('ignore')\n",
1916
+ "\n",
1917
+ "import pickle"
1918
+ ]
1919
+ },
1920
+ {
1921
+ "cell_type": "code",
1922
+ "execution_count": 2,
1923
+ "id": "62be1870",
1924
+ "metadata": {},
1925
+ "outputs": [],
1926
+ "source": [
1927
+ "# load the saved model using pickle\n",
1928
+ "with open('aajTak_model.pkl', 'rb') as f1:\n",
1929
+ " model1 = pickle.load(f1)"
1930
+ ]
1931
+ },
1932
+ {
1933
+ "cell_type": "code",
1934
+ "execution_count": 3,
1935
+ "id": "0b4e2a7c",
1936
+ "metadata": {},
1937
+ "outputs": [],
1938
+ "source": [
1939
+ "# # load the saved model using pickle\n",
1940
+ "# with open('aajTak_fineTune_model.pkl', 'rb') as file:\n",
1941
+ "# model = pickle.load(file)\n",
1942
+ "\n",
1943
+ "# Load the saved weekDay label encoder object using pickle\n",
1944
+ "with open('weekDay_le.pkl','rb') as file1:\n",
1945
+ " weekDay_le = pickle.load(file1)\n",
1946
+ "\n",
1947
+ "# Load the saved timeBand label encoder object using pickle\n",
1948
+ "with open('timeBand_le.pkl','rb') as file2:\n",
1949
+ " timeBand_le = pickle.load(file2)"
1950
+ ]
1951
+ },
1952
+ {
1953
+ "cell_type": "code",
1954
+ "execution_count": 4,
1955
+ "id": "e3a13c4e",
1956
+ "metadata": {},
1957
+ "outputs": [],
1958
+ "source": [
1959
+ "# define the prediction function\n",
1960
+ "# X = df[['Share', 'AMA', 'rate','daily reach', 'cume reach','Week_Day_Encoded','Time_Band_Encoded']]\n",
1961
+ "# y = df[['Unrolled']]\n",
1962
+ "\n",
1963
+ "\n",
1964
+ "def predict_unrolled_value(Share, AMA, rate, daily_reach, cume_reach, Week_Day, Time_Band):\n",
1965
+ " \n",
1966
+ " # create a DataFrame with the input variables\n",
1967
+ " \n",
1968
+ " # encode the Week_Day using the loaded LabelEncoder object\n",
1969
+ " weekDay_encoded = weekDay_le.transform([Week_Day])[0]\n",
1970
+ " \n",
1971
+ " # encode the Time_Band using the loaded LabelEncoder object\n",
1972
+ " Time_Band_encoded = timeBand_le.transform([Time_Band])[0]\n",
1973
+ " \n",
1974
+ " input_data = pd.DataFrame({'Share': [Share], \n",
1975
+ " 'AMA': [AMA], \n",
1976
+ " 'rate': [rate],\n",
1977
+ " 'daily reach': [daily_reach], \n",
1978
+ " 'cume reach': [cume_reach], \n",
1979
+ " 'Week_Day_Encoded': [weekDay_encoded], \n",
1980
+ " 'Time_Band_Encoded': [Time_Band_encoded]})\n",
1981
+ " \n",
1982
+ " # make the prediction using the loaded model and input data\n",
1983
+ " predicted_unrolled_value = model1.predict(input_data)\n",
1984
+ " \n",
1985
+ " # return the predicted unrolled value as output\n",
1986
+ " return predicted_unrolled_value[0]"
1987
+ ]
1988
+ },
1989
+ {
1990
+ "cell_type": "code",
1991
+ "execution_count": 5,
1992
+ "id": "df4390e9",
1993
+ "metadata": {},
1994
+ "outputs": [
1995
+ {
1996
+ "data": {
1997
+ "text/plain": [
1998
+ "4.123954"
1999
+ ]
2000
+ },
2001
+ "execution_count": 5,
2002
+ "metadata": {},
2003
+ "output_type": "execute_result"
2004
+ }
2005
+ ],
2006
+ "source": [
2007
+ "# Function calling\n",
2008
+ "# 0.064741\t2.991051\t0.013427\t41.62\t41.619074\t'Wednesday'\t'18:30:00 - 19:00:00' --> test input data\n",
2009
+ "# 5.781056 --> unrolled actual value\n",
2010
+ "\n",
2011
+ "predict_unrolled_value(0.064741, 2.991051, 0.013427, 41.62, 41.619074, 'Wednesday', '18:30:00 - 19:00:00')"
2012
+ ]
2013
+ },
2014
+ {
2015
+ "cell_type": "code",
2016
+ "execution_count": 6,
2017
+ "id": "5fadb125",
2018
+ "metadata": {},
2019
+ "outputs": [
2020
+ {
2021
+ "data": {
2022
+ "text/plain": [
2023
+ "9.738856000000002"
2024
+ ]
2025
+ },
2026
+ "execution_count": 6,
2027
+ "metadata": {},
2028
+ "output_type": "execute_result"
2029
+ }
2030
+ ],
2031
+ "source": [
2032
+ "# 0.152596\t9.820626\t0.043337\t94.61\t94.614234\t1\t'20:00:00 - 20:30:00'\n",
2033
+ "# 12.150886\n",
2034
+ "predict_unrolled_value(0.152596, 9.820626, 0.043337, 94.61, 94.614234, 'Monday', '20:00:00 - 20:30:00')"
2035
+ ]
2036
+ },
2037
+ {
2038
+ "cell_type": "code",
2039
+ "execution_count": 7,
2040
+ "id": "3ec5b3e0",
2041
+ "metadata": {},
2042
+ "outputs": [
2043
+ {
2044
+ "data": {
2045
+ "text/plain": [
2046
+ "3.3215619"
2047
+ ]
2048
+ },
2049
+ "execution_count": 7,
2050
+ "metadata": {},
2051
+ "output_type": "execute_result"
2052
+ }
2053
+ ],
2054
+ "source": [
2055
+ "# 0.611246\t4.196084\t0.018516\t36.23\t36.231006\t'Saturday'\t''08:00:00 - 08:30:00''\n",
2056
+ "# 3.711884\n",
2057
+ "predict_unrolled_value(0.611246, 4.196084, 0.018516, 36.23, 36.23, 'Saturday', '08:00:00 - 08:30:00')"
2058
+ ]
2059
+ },
2060
+ {
2061
+ "cell_type": "code",
2062
+ "execution_count": null,
2063
+ "id": "83a75023",
2064
+ "metadata": {},
2065
+ "outputs": [],
2066
+ "source": []
2067
+ },
2068
+ {
2069
+ "cell_type": "code",
2070
+ "execution_count": null,
2071
+ "id": "1799f490",
2072
+ "metadata": {},
2073
+ "outputs": [],
2074
+ "source": []
2075
+ }
2076
+ ],
2077
+ "metadata": {
2078
+ "kernelspec": {
2079
+ "display_name": "Python 3 (ipykernel)",
2080
+ "language": "python",
2081
+ "name": "python3"
2082
+ },
2083
+ "language_info": {
2084
+ "codemirror_mode": {
2085
+ "name": "ipython",
2086
+ "version": 3
2087
+ },
2088
+ "file_extension": ".py",
2089
+ "mimetype": "text/x-python",
2090
+ "name": "python",
2091
+ "nbconvert_exporter": "python",
2092
+ "pygments_lexer": "ipython3",
2093
+ "version": "3.9.10"
2094
+ }
2095
+ },
2096
+ "nbformat": 4,
2097
+ "nbformat_minor": 5
2098
+ }
aajTak_model.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5f28cd030817fb6bbbcaed82f2170e9d81cdf415f4af8631620b60fed3d15b9
3
+ size 8680525
input_raw_data.xlsx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f011103414f3360afe65aec973b3674514bc66d20a9a053c250ee04f68a03b46
3
+ size 1057440
timeBand_le.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:74ea331adc8d7dcb4cc11585b75fa42a8f77b3fda8418c70cbe1042ef21c8c4a
3
+ size 1298
weekDay_le.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:637bc881b0bb4cb2839589ff1292f7854daf011a20c6ca79549e323dc356f5cb
3
+ size 313