metadata
license: mit
library_name: sklearn
tags:
- tabular-classification
- sklearn
- phishing
- onnx
model_format: pickle
model_file: model.pkl
widget:
- structuredData:
domain_age:
- 11039
- -1
- 5636
domain_registration_length:
- 3571
- 0
- 208
google_index:
- 0
- 0
- 0
nb_hyperlinks:
- 97
- 168
- 52
page_rank:
- 5
- 2
- 10
ratio_extHyperlinks:
- 0.030927835
- 0.220238095
- 0.442307692
ratio_extRedirection:
- 0
- 0.378378378
- 0
ratio_intHyperlinks:
- 0.969072165
- 0.779761905
- 0.557692308
safe_anchor:
- 25
- 24.32432432
- 0
status:
- legitimate
- legitimate
- legitimate
web_traffic:
- 178542
- 0
- 2
inference: false
pipeline_tag: tabular-classification
Model description
Training Procedure
Hyperparameters
Click to expand
Hyperparameter | Value |
---|---|
base_estimator | deprecated |
cv | 5 |
ensemble | True |
estimator__bootstrap | True |
estimator__ccp_alpha | 0.0 |
estimator__class_weight | |
estimator__criterion | gini |
estimator__max_depth | |
estimator__max_features | sqrt |
estimator__max_leaf_nodes | |
estimator__max_samples | |
estimator__min_impurity_decrease | 0.0 |
estimator__min_samples_leaf | 1 |
estimator__min_samples_split | 2 |
estimator__min_weight_fraction_leaf | 0.0 |
estimator__n_estimators | 100 |
estimator__n_jobs | |
estimator__oob_score | False |
estimator__random_state | |
estimator__verbose | 0 |
estimator__warm_start | False |
estimator | RandomForestClassifier() |
method | isotonic |
n_jobs |
Model Plot
This is the architecture of the model loaded by joblib.
CalibratedClassifierCV(cv=5, estimator=RandomForestClassifier(),method='isotonic')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
CalibratedClassifierCV(cv=5, estimator=RandomForestClassifier(),method='isotonic')
RandomForestClassifier()
RandomForestClassifier()
Evaluation Results
Metric | Value |
---|---|
accuracy | 0.945652 |
f1-score | 0.945114 |
precision | 0.951996 |
recall | 0.938331 |
How to Get Started with the Model
Below are some code snippets to load the model.
With ONNX (recommended)
Python
import onnxruntime
import pandas as pd
from huggingface_hub import hf_hub_download
REPO_ID = "pirocheto/phishing-url-detection"
FILENAME = "model.onnx"
# Initializing the ONNX Runtime session with the pre-trained model
sess = onnxruntime.InferenceSession(
hf_hub_download(repo_id=REPO_ID, filename=FILENAME),
providers=["CPUExecutionProvider"],
)
# Defining a list of URLs with characteristics
data = [
{
"url": "https://www.rga.com/about/workplace",
"nb_hyperlinks": 97,
"ratio_intHyperlinks": 0.969072165,
"ratio_extHyperlinks": 0.030927835,
"ratio_extRedirection": 0,
"safe_anchor": 25,
"domain_registration_length": 3571,
"domain_age": 11039,
"web_traffic": 178542,
"google_index": 0,
"page_rank": 5,
},
]
# Converting data to a float32 NumPy array
df = pd.DataFrame(data).set_index("url")
inputs = df.to_numpy(dtype="float32")
# Using the ONNX model to make predictions on the input data
probas = sess.run(None, {"X": inputs})[1]
# Displaying the results
for url, proba in zip(data, probas):
print(f"URL: {url['url']}")
print(f"Likelihood of being a phishing site: {proba[1] * 100:.2f}%")
print("----")
# Output:
# URL: https://www.rga.com/about/workplace
# Likelihood of being a phishing site: 0.89%
# ----