Insurance Fraud Detection Model

A GradientBoostingClassifier pipeline for detecting fraudulent insurance claims, trained on South African insurance data.

Intended Use

This model is intended for educational and demonstration purposes as part of an end-to-end ML pipeline showcasing Databricks, MLflow, Azure ML, and Hugging Face Hub integration.

Model Details

Property Value
Classifier GradientBoostingClassifier
Pipeline steps preprocessor -> classifier
Training samples 6,400
Test samples 1,600
Target column target
Created 2026-06-16T15:37:40.689330+00:00

Evaluation Metrics

Metric Score
Accuracy 0.9094
Precision 0.5385
Recall 0.7730
F1 0.6348
ROC AUC 0.9418

Confusion Matrix

Confusion Matrix

ROC Curve

ROC Curve

Feature Importance

Feature Importance

Features

Numeric: claim_amount, policy_tenure_months, customer_age, num_prior_claims, premium_amount, days_to_report, witness_present, police_report_filed, vehicle_age_years

Categorical: incident_type, province

Sample Usage

import joblib
from huggingface_hub import hf_hub_download
import pandas as pd

# Download and load the model
model_path = hf_hub_download(
    repo_id="ThabangTheActuaryCoder/insurance-fraud-detection-model",
    filename="fraud_detection_model.joblib",
)
model = joblib.load(model_path)

# Create a sample input
sample = pd.DataFrame([{"claim_amount": 0, "policy_tenure_months": 0, "customer_age": 0, "num_prior_claims": 0, "premium_amount": 0, "days_to_report": 0, "witness_present": 0, "police_report_filed": 0, "vehicle_age_years": 0, "incident_type": 0, "province": 0}])

# Predict
prediction = model.predict(sample)
probabilities = model.predict_proba(sample)
print(f"Prediction: {prediction}, Probabilities: {probabilities}")
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using ThabangTheActuaryCoder/insurance-fraud-detection-model 1

Collection including ThabangTheActuaryCoder/insurance-fraud-detection-model