CheXVision-mini β from-scratch NumPy neural network
A pure-NumPy multilayer perceptron (no autograd, no deep-learning framework), with every forward and backward pass derived and coded by hand, trained for binary chest X-ray screening (normal vs abnormal) on NIH ChestX-ray14.
Companion to CheXVision (PyTorch: a custom CNN + a DenseNet-121 transfer model). This model demonstrates the fundamentals β hand-written backprop verified by finite-difference gradient checking. It is intentionally a fundamentals demo: the headline performance belongs to the PyTorch models (DenseNet binary AUC β 0.787), not to this MLP.
Results β held-out test set (final)
Metrics on an untouched test split, at an operating threshold chosen on the validation set only (Youden's J = 0.389). ROC-AUC is threshold-independent.
| Metric | Test | Validation |
|---|---|---|
| ROC-AUC | 0.6502 | 0.6994 |
| Accuracy | 0.6467 | 0.6536 |
| Balanced accuracy | 0.5904 | 0.6517 |
| Precision | 0.6749 | 0.5803 |
| Recall (sensitivity) | 0.8277 | 0.6393 |
| Specificity | 0.3530 | 0.6640 |
| F1 | 0.7435 | 0.6084 |
Checkpoint selected by best validation AUC (epoch 176/200). Samples β train 60000, val 8557, test 10000 (test positive rate 0.6187). Test confusion matrix @ 0.389: TN=1346, FP=2467, FN=1066, TP=5121.
Note on the test split: NIH ChestX-ray14's official
testsplit is more positive-heavy (0.6187) than train/validation (0.4208). Because of that base-rate shift, plain accuracy can mislead β ROC-AUC (threshold-independent) and balanced accuracy are the metrics to trust for comparison.
Architecture
MLP on 64Γ64 grayscale images: 4096 β 1024 β 256 β 64 β 1 logit, ReLU activations, dropout 0.3, He initialisation. Loss: BCE-with-logits (+ label smoothing 0.05). Optimizer: adam with cosine LR decay; L2 weight decay (weights only). Per-feature standardisation; augmentation: h-flip / noise / brightness.
Files
model.npzβ best weights + normalisation stats (_norm_mean,_norm_std).metrics.jsonβ test & validation metrics, ROC/PR curves, confusion matrices, config.history.jsonβ per-epoch train/reg/val loss, val accuracy/AUC, learning rate.val_scores.npy/val_labels.npy,test_scores.npy/test_labels.npyβ raw scores + labels.loss_curve.pngβ training curves + val AUC.
Usage
from chexvision_mini.inference import load_checkpoint, preprocess_image, predict_label
model, mean, std, threshold = load_checkpoint("artifacts")
x = preprocess_image("xray.png", image_size=64, mean=mean, std=std)
prob, label = predict_label(model, x, threshold) # P(abnormal), "normal"/"abnormal"
Or from the CLI: python -m chexvision_mini predict --checkpoint artifacts --image xray.png.
Links
- Code: https://github.com/arudaev/chexvision-mini
- Parent project: https://github.com/arudaev/chexvision