YOLOv8 Model for Classification of Normal vs Scam Websites

Overview

This YOLOv8 model is trained to classify images of websites as either "normal" (legitimate) or "scam" based on visual content patterns. The model has been developed to detect fraudulent website layouts or elements, potentially aiding in identifying phishing or scam websites. The model was trained using a custom dataset containing website screenshots from two classes: normal and scam. It is equipped with augmentations and dropout to enhance generalization, and can be used in fraud detection systems for website verification.

Model Details

Model Architecture: YOLOv8
Classes:
- 0: Normal Website
- 1: Scam Website
Dropout: 0.2 (This dropout rate helps reduce overfitting)
Augmentation:
- Horizontal flip
- 90° rotation (clockwise and counterclockwise)
- Grayscale (applied to 15% of images)
- Blur (up to 2px)
- Noise (up to 1.49% of pixels)

Dataset

Dataset Purpose: Classify website screenshots as either normal (legitimate) or scam.
Dataset Split:
- Training Set: 13,506 images (78%)
- Validation Set: 3,011 images (17%)
- Test Set: 832 images (5%)
Preprocessing:
- Auto-orientation applied
- Images resized to 640x640
Augmentations:
- Flip: Horizontal
- Rotate: Clockwise, Counter-clockwise (90°)
- Grayscale: Apply to 15% of images
- Blur: Up to 2px
- Noise: Up to 1.49% of pixels

Training Details

Task: Image classification of website screenshots into "normal" and "scam".
Batch Size: 16
Epochs: 20
Optimizer: AdamW
Learning Rate: Automatically tuned using YOLOv8's auto-tuning mechanism.
Mixed Precision: Automatic Mixed Precision (AMP) enabled for faster training.

Performance

Training Metrics:
- High mAP scores on validation and test sets.
- Precision, recall, and F1 scores show strong performance for both "normal" and "scam" website classifications.
- The loss values decrease consistently across epochs, indicating stable training.

Usage

Inference

To use the model for inference on new website screenshots, you can load the model and perform predictions using the following:

from ultralytics import YOLO

# Load the model
model = YOLO("path_to_your_best.pt")

# Run inference
results = model.predict("path_to_website_screenshot.jpg")

The predictions will include the detected class along with confidence scores.

Intended Use and Limitations

This model is primarily intended for the detection of scam and normal websites based on their visual appearance. While it performs well on the training and validation datasets, performance may vary when applied to other datasets or new website layouts that were not represented in the training data. This model is only trained on 13506 images, including the augmented images of normal and scam websites.

How to Cite

If you use this model in your work, please cite it as follows:

@model {KarolinaJocelynVIFD2024,
  author    = {Karolina Jocelyn},
  title     = {YOLOv8 Model for Normal vs Scam Website Classification},
  year      = {2024},
  platform  = {HuggingFace Model Hub},
}

Future Work

To improve the model's generalization and extend its applicability to other website structures, future work could involve:

Expanding the dataset with more diverse website layouts and styles
Fine-tuning the learning rate and adding more advanced augmentations
Adding more scam categories to enhance fraud detection

This model is uploaded and hosted on the HuggingFace Model Hub. You can download and use it directly in your projects.