GitHub Action
commit from github
672fe2d
|
raw
history blame
10.3 kB
metadata
license: mit
library_name: sklearn
tags:
  - tabular-classification
  - sklearn
  - phishing
  - onnx
model_format: pickle
model_file: model.pkl
widget:
  - structuredData:
      domain_age:
        - 11039
        - -1
        - 5636
      domain_registration_length:
        - 3571
        - 0
        - 208
      google_index:
        - 0
        - 0
        - 0
      nb_hyperlinks:
        - 97
        - 168
        - 52
      page_rank:
        - 5
        - 2
        - 10
      ratio_extHyperlinks:
        - 0.030927835
        - 0.220238095
        - 0.442307692
      ratio_extRedirection:
        - 0
        - 0.378378378
        - 0
      ratio_intHyperlinks:
        - 0.969072165
        - 0.779761905
        - 0.557692308
      safe_anchor:
        - 25
        - 24.32432432
        - 0
      status:
        - legitimate
        - legitimate
        - legitimate
      web_traffic:
        - 178542
        - 0
        - 2
inference: false
pipeline_tag: tabular-classification

Model description

Training Procedure

Hyperparameters

Click to expand
Hyperparameter Value
base_estimator deprecated
cv 5
ensemble True
estimator__bootstrap True
estimator__ccp_alpha 0.0
estimator__class_weight
estimator__criterion gini
estimator__max_depth
estimator__max_features sqrt
estimator__max_leaf_nodes
estimator__max_samples
estimator__min_impurity_decrease 0.0
estimator__min_samples_leaf 1
estimator__min_samples_split 2
estimator__min_weight_fraction_leaf 0.0
estimator__n_estimators 100
estimator__n_jobs
estimator__oob_score False
estimator__random_state
estimator__verbose 0
estimator__warm_start False
estimator RandomForestClassifier()
method isotonic
n_jobs

Model Plot

This is the architecture of the model loaded by joblib.

CalibratedClassifierCV(cv=5, estimator=RandomForestClassifier(),method='isotonic')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Evaluation Results

Metric Value
accuracy 0.945652
f1-score 0.945114
precision 0.951996
recall 0.938331

How to Get Started with the Model

Below are some code snippets to load the model.

With ONNX (recommended)

Python

import onnxruntime
import pandas as pd
from huggingface_hub import hf_hub_download

REPO_ID = "pirocheto/phishing-url-detection"
FILENAME = "model.onnx"

# Initializing the ONNX Runtime session with the pre-trained model
sess = onnxruntime.InferenceSession(
    hf_hub_download(repo_id=REPO_ID, filename=FILENAME),
    providers=["CPUExecutionProvider"],
)

# Defining a list of URLs with characteristics
data = [
    {
        "url": "https://www.rga.com/about/workplace",
        "nb_hyperlinks": 97,
        "ratio_intHyperlinks": 0.969072165,
        "ratio_extHyperlinks": 0.030927835,
        "ratio_extRedirection": 0,
        "safe_anchor": 25,
        "domain_registration_length": 3571,
        "domain_age": 11039,
        "web_traffic": 178542,
        "google_index": 0,
        "page_rank": 5,
    },
]

# Converting data to a float32 NumPy array
df = pd.DataFrame(data).set_index("url")
inputs = df.to_numpy(dtype="float32")

# Using the ONNX model to make predictions on the input data
probas = sess.run(None, {"X": inputs})[1]

# Displaying the results
for url, proba in zip(data, probas):
    print(f"URL: {url['url']}")
    print(f"Likelihood of being a phishing site: {proba[1] * 100:.2f}%")
    print("----")

# Output:
# URL: https://www.rga.com/about/workplace
# Likelihood of being a phishing site: 0.89%
# ----