CheXVision-mini — from-scratch NumPy neural network

A pure-NumPy multilayer perceptron (no autograd, no deep-learning framework), with every forward and backward pass derived and coded by hand, trained for binary chest X-ray screening (normal vs abnormal) on NIH ChestX-ray14.

Companion to CheXVision (PyTorch: a custom CNN + a DenseNet-121 transfer model). This model demonstrates the fundamentals — hand-written backprop verified by finite-difference gradient checking. It is intentionally a fundamentals demo: the headline performance belongs to the PyTorch models (DenseNet binary AUC ≈ 0.787), not to this MLP.

Results — held-out test set (final)

Metrics on an untouched test split, at an operating threshold chosen on the validation set only (Youden's J = 0.389). ROC-AUC is threshold-independent.

Metric	Test	Validation
ROC-AUC	0.6502	0.6994
Accuracy	0.6467	0.6536
Balanced accuracy	0.5904	0.6517
Precision	0.6749	0.5803
Recall (sensitivity)	0.8277	0.6393
Specificity	0.3530	0.6640
F1	0.7435	0.6084

Checkpoint selected by best validation AUC (epoch 176/200). Samples — train 60000, val 8557, test 10000 (test positive rate 0.6187). Test confusion matrix @ 0.389: TN=1346, FP=2467, FN=1066, TP=5121.

Note on the test split: NIH ChestX-ray14's official test split is more positive-heavy (0.6187) than train/validation (0.4208). Because of that base-rate shift, plain accuracy can mislead — ROC-AUC (threshold-independent) and balanced accuracy are the metrics to trust for comparison.

Architecture

MLP on 64×64 grayscale images: 4096 → 1024 → 256 → 64 → 1 logit, ReLU activations, dropout 0.3, He initialisation. Loss: BCE-with-logits (+ label smoothing 0.05). Optimizer: adam with cosine LR decay; L2 weight decay (weights only). Per-feature standardisation; augmentation: h-flip / noise / brightness.

Files

model.npz — best weights + normalisation stats (_norm_mean, _norm_std).
metrics.json — test & validation metrics, ROC/PR curves, confusion matrices, config.
history.json — per-epoch train/reg/val loss, val accuracy/AUC, learning rate.
val_scores.npy / val_labels.npy, test_scores.npy / test_labels.npy — raw scores + labels.
loss_curve.png — training curves + val AUC.

Usage

from chexvision_mini.inference import load_checkpoint, preprocess_image, predict_label
model, mean, std, threshold = load_checkpoint("artifacts")
x = preprocess_image("xray.png", image_size=64, mean=mean, std=std)
prob, label = predict_label(model, x, threshold)   # P(abnormal), "normal"/"abnormal"

Or from the CLI: python -m chexvision_mini predict --checkpoint artifacts --image xray.png.

arudaev
/

chexvision-mini

CheXVision-mini — from-scratch NumPy neural network

Results — held-out test set (final)

Architecture

Files

Usage

Links