YOLOv8 Model for Classification of Normal vs Scam Websites
Overview
This YOLOv8 model is trained to classify images of websites as either "normal" (legitimate) or "scam" based on visual content patterns. The model has been developed to detect fraudulent website layouts or elements, potentially aiding in identifying phishing or scam websites. The model was trained using a custom dataset containing website screenshots from two classes: normal and scam. It is equipped with augmentations and dropout to enhance generalization, and can be used in fraud detection systems for website verification.
Model Details
- Model Architecture: YOLOv8
- Classes:
0
: Normal Website1
: Scam Website
- Dropout: 0.2 (This dropout rate helps reduce overfitting)
- Augmentation:
- Horizontal flip
- 90° rotation (clockwise and counterclockwise)
- Grayscale (applied to 15% of images)
- Blur (up to 2px)
- Noise (up to 1.49% of pixels)
Dataset
Dataset Purpose: Classify website screenshots as either normal (legitimate) or scam.
Dataset Split:
- Training Set: 13,506 images (78%)
- Validation Set: 3,011 images (17%)
- Test Set: 832 images (5%)
Preprocessing:
- Auto-orientation applied
- Images resized to 640x640
Augmentations:
- Flip: Horizontal
- Rotate: Clockwise, Counter-clockwise (90°)
- Grayscale: Apply to 15% of images
- Blur: Up to 2px
- Noise: Up to 1.49% of pixels
Training Details
- Task: Image classification of website screenshots into "normal" and "scam".
- Batch Size: 16
- Epochs: 20
- Optimizer: AdamW
- Learning Rate: Automatically tuned using YOLOv8's auto-tuning mechanism.
- Mixed Precision: Automatic Mixed Precision (AMP) enabled for faster training.
Performance
- Training Metrics:
- High mAP scores on validation and test sets.
- Precision, recall, and F1 scores show strong performance for both "normal" and "scam" website classifications.
- The loss values decrease consistently across epochs, indicating stable training.
Usage
Inference
To use the model for inference on new website screenshots, you can load the model and perform predictions using the following:
from ultralytics import YOLO
# Load the model
model = YOLO("path_to_your_best.pt")
# Run inference
results = model.predict("path_to_website_screenshot.jpg")
The predictions will include the detected class along with confidence scores.
Intended Use and Limitations
This model is primarily intended for the detection of scam and normal websites based on their visual appearance. While it performs well on the training and validation datasets, performance may vary when applied to other datasets or new website layouts that were not represented in the training data. This model is only trained on 13506 images, including the augmented images of normal and scam websites.
How to Cite
If you use this model in your work, please cite it as follows:
@model {KarolinaJocelynVIFD2024,
author = {Karolina Jocelyn},
title = {YOLOv8 Model for Normal vs Scam Website Classification},
year = {2024},
platform = {HuggingFace Model Hub},
}
Future Work
To improve the model's generalization and extend its applicability to other website structures, future work could involve:
- Expanding the dataset with more diverse website layouts and styles
- Fine-tuning the learning rate and adding more advanced augmentations
- Adding more scam categories to enhance fraud detection
This model is uploaded and hosted on the HuggingFace Model Hub. You can download and use it directly in your projects.