YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

πŸ”’ SecureBERT Phishing Detection Model

This repository hosts a fine-tuned SecureBERT-based model optimized for phishing URL detection using a cybersecurity dataset. The model classifies URLs as either phishing (malicious) or safe (benign).


πŸ“š Model Details

  • Model Architecture: SecureBERT (Based on BERT)
  • Task: Binary Classification (Phishing vs. Safe)
  • Dataset: shashwatwork/web-page-phishing-detection-dataset (11,431 URLs, 88 features)
  • Framework: PyTorch & Hugging Face Transformers
  • Input Data: URL strings & extracted numerical features
  • Number of Classes: 2 (Phishing, Safe)
  • Quantization: FP16 (for efficiency)

πŸš€ Usage

Installation

pip install torch transformers scikit-learn pandas

Loading the Model

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the fine-tuned model and tokenizer
model_path = "./fine_tuned_SecureBERT"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)
model.eval()  # Set model to evaluation mode

print("βœ… SecureBERT model loaded successfully and ready for inference!")

πŸ” Perform Phishing Detection

def predict_url(url):
    # Tokenize input
    encoding = tokenizer(url, truncation=True, padding=True, max_length=512, return_tensors="pt")
    
    # Perform inference
    with torch.no_grad():
        output = model(**encoding)
    
    # Get predicted class
    predicted_class = torch.argmax(output.logits, dim=1).item()
    
    # Map label
    label = "Phishing" if predicted_class == 1 else "Safe"
    return label

# Example usage
custom_url = "http://example.com/free-gift"
prediction = predict_url(custom_url)
print(f"Predicted label: {prediction}")

πŸ“Š Evaluation Results

After fine-tuning, the model was evaluated on a test set, achieving the following performance:

Metric Score
Accuracy 97.2%
Precision 96.8%
Recall 97.5%
F1-Score 97.1%
Inference Speed Fast (Optimized with FP16)

πŸ› οΈ Fine-Tuning Details

Dataset

The model was trained on a shashwatwork/web-page-phishing-detection-dataset consisting of 11,431 URLs labeled as either phishing or safe. Features include URL characteristics, domain properties, and additional metadata.

Training Configuration

  • Number of epochs: 5
  • Batch size: 16
  • Optimizer: AdamW
  • Learning rate: 2e-5
  • Loss Function: Cross-Entropy
  • Evaluation Strategy: Validation at each epoch

Quantization

The model was quantized using FP16 precision, reducing latency and memory usage while maintaining high accuracy.


⚠️ Limitations

  • Evasion Techniques: Attackers constantly evolve phishing techniques, which may reduce model effectiveness.
  • Dataset Bias: The model was trained on a specific dataset; new phishing tactics may require retraining.
  • False Positives: Some legitimate but unusual URLs might be classified as phishing.

βœ… Use this fine-tuned SecureBERT model for accurate and efficient phishing detection! πŸ”’πŸš€

Downloads last month
209
Safetensors
Model size
125M params
Tensor type
FP16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support