Model description

This is a Gaussian Naive Bayes model trained on a synthetic dataset, containining a large variety of transaction types representing normal activities as well as abnormal/fraudulent activities generated by J.P. Morgan AI Research. The model predicts whether a transaction is normal or fraudulent.

Intended uses & limitations

For educational purposes

Training Procedure

The data preprocessing steps applied include the following:

  • Dropping high cardinality features. This includes Transaction ID, Sender ID, Sender Account, Beneficiary ID, Beneficiary Account, Sender Sector
  • Dropping no variance features. This includes Sender LOB
  • Dropping Time and date feature since the model is not time-series based
  • Transforming and Encoding categorical features namely: Sender Country, Beneficiary Country, Transaction Type, and the target variable, Label
  • Applying feature scaling on all features
  • Splitting the dataset into training/test set using 85/15 split ratio
  • Handling imbalanced dataset using imblearn framework and applying RandomUnderSampler method to eliminate noise which led to a 2.5% improvement in accuracy

image/png

Hyperparameters

Click to expand
Hyperparameter Value
memory
steps [('preprocessorAll', ColumnTransformer(remainder='passthrough',
transformers=[('cat',
Pipeline(steps=[('onehot',
OneHotEncoder(handle_unknown='ignore',
sparse_output=False))]),
['Sender_Country', 'Bene_Country',
'Transaction_Type']),
('num',
Pipeline(steps=[('scale', StandardScaler())]),
Index(['USD_amount'], dtype='object'))])), ('classifier', GaussianNB())]
verbose False
preprocessorAll ColumnTransformer(remainder='passthrough',
transformers=[('cat',
Pipeline(steps=[('onehot',
OneHotEncoder(handle_unknown='ignore',
sparse_output=False))]),
['Sender_Country', 'Bene_Country',
'Transaction_Type']),
('num',
Pipeline(steps=[('scale', StandardScaler())]),
Index(['USD_amount'], dtype='object'))])
classifier GaussianNB()
preprocessorAll__n_jobs
preprocessorAll__remainder passthrough
preprocessorAll__sparse_threshold 0.3
preprocessorAll__transformer_weights
preprocessorAll__transformers [('cat', Pipeline(steps=[('onehot',
OneHotEncoder(handle_unknown='ignore', sparse_output=False))]), ['Sender_Country', 'Bene_Country', 'Transaction_Type']), ('num', Pipeline(steps=[('scale', StandardScaler())]), Index(['USD_amount'], dtype='object'))]
preprocessorAll__verbose False
preprocessorAll__verbose_feature_names_out True
preprocessorAll__cat Pipeline(steps=[('onehot',
OneHotEncoder(handle_unknown='ignore', sparse_output=False))])
preprocessorAll__num Pipeline(steps=[('scale', StandardScaler())])
preprocessorAll__cat__memory
preprocessorAll__cat__steps [('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))]
preprocessorAll__cat__verbose False
preprocessorAll__cat__onehot OneHotEncoder(handle_unknown='ignore', sparse_output=False)
preprocessorAll__cat__onehot__categories auto
preprocessorAll__cat__onehot__drop
preprocessorAll__cat__onehot__dtype <class 'numpy.float64'>
preprocessorAll__cat__onehot__handle_unknown ignore
preprocessorAll__cat__onehot__max_categories
preprocessorAll__cat__onehot__min_frequency
preprocessorAll__cat__onehot__sparse deprecated
preprocessorAll__cat__onehot__sparse_output False
preprocessorAll__num__memory
preprocessorAll__num__steps [('scale', StandardScaler())]
preprocessorAll__num__verbose False
preprocessorAll__num__scale StandardScaler()
preprocessorAll__num__scale__copy True
preprocessorAll__num__scale__with_mean True
preprocessorAll__num__scale__with_std True
classifier__priors
classifier__var_smoothing 1e-09

Model Plot

Pipeline(steps=[('preprocessorAll',ColumnTransformer(remainder='passthrough',transformers=[('cat',Pipeline(steps=[('onehot',OneHotEncoder(handle_unknown='ignore',sparse_output=False))]),['Sender_Country','Bene_Country','Transaction_Type']),('num',Pipeline(steps=[('scale',StandardScaler())]),Index(['USD_amount'], dtype='object'))])),('classifier', GaussianNB())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Evaluation Results

Metric Value
accuracy 0.794582

Model Explainability

SHAP was used to determine the important features that helps the model make decisions image/png

Confusion Matrix

Confusion Matrix

Model Card Authors

This model card is written by following authors: Seifullah Bello

Downloads last month
0
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using saifhmb/fraud-detection-model 1