FredZhang7
commited on
Commit
•
b130037
1
Parent(s):
83074ed
update training documentation
Browse files
README.md
CHANGED
@@ -57,6 +57,22 @@ The classification task is split into two stages:
|
|
57 |
- 98.2% on training data, 98.7% accuracy on validation
|
58 |
- 911,180 rows of 11 features
|
59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
60 |
|
61 |
## URL Features
|
62 |
```python
|
|
|
57 |
- 98.2% on training data, 98.7% accuracy on validation
|
58 |
- 911,180 rows of 11 features
|
59 |
|
60 |
+
## Training Features
|
61 |
+
I applied cross-validation with `cv=5` to the training dataset to search for the best hyperparameters.
|
62 |
+
Here's the dict passed to `GridSearchCV`:
|
63 |
+
```python
|
64 |
+
params = {
|
65 |
+
'objective': 'binary',
|
66 |
+
'metric': 'binary_logloss',
|
67 |
+
'boosting_type': ['gbdt', 'dart'],
|
68 |
+
'num_leaves': [15, 23, 31, 63],
|
69 |
+
'learning_rate': [0.001, 0.002, 0.01, 0.02],
|
70 |
+
'feature_fraction': [0.5, 0.6, 0.7, 0.9],
|
71 |
+
'early_stopping_rounds': [10, 20],
|
72 |
+
'num_boost_round': [500, 750, 800, 900, 1000, 1250, 2000]
|
73 |
+
}
|
74 |
+
```
|
75 |
+
|
76 |
|
77 |
## URL Features
|
78 |
```python
|