DPYD Variant Pathogenicity Classifier

Trained classifier for DPYD variant functional classification (no_function / decreased_function / normal_function) built by Anukriti AI.

Versions

v1: Baseline — 273 training rows, 4 decreased_function examples, F1=0.0 for decreased_function
v2: +Offer 2014 + PharmVar data, SMOTE, domain features — 392 rows, 23 decreased_function examples, F1=0.66 (XGB)
v3: +AlphaMissense + CADD + VEP annotations on training set — AM importance 0→0.01, sentinel gap closed
v4: protein-only model — drop clnsig_norm/activity_pct/AF, keep AM/CADD/SIFT/PolyPhen/domain/aa_position/vep_consequence. 13/25 Scaria variants changed class (no_function→decreased_function). CV accuracy 0.91→0.73 (expected — label proxy removed). Best model: lgbm (decreased_function F1=0.25).

Two-tier inference note

The two tier-2 options for novel population-discovery variants (no ClinVar significance, no measured activity — e.g. the Scaria 2025 Indian cohort) are complementary, not competing:

The protein-only classifier (v4) is preferred when class discrimination between decreased_function and no_function is required. Unlike v1–v3, v4 does not default novel variants to no_function (it reclassified 13/25 Scaria variants), at the cost of in-distribution accuracy it does not need on this tail.
The raw AlphaMissense score is sufficient when the clinical question is simply pathogenicity vs. benign.

For variants that do carry clinical labels or measured activity, the v2/v3 mixed-feature classifier (decreased_function F1=0.66) remains the better tool. CPIC-assigned variants are handled by the deterministic engine, not any classifier. See validation paper.

Ground truth sources

PharmVar activity scores, Offer et al. 2014, GeT-RM 2024 (Gaedigk et al.), ClinVar

Citation

Abhimanyu R B et al. "Deterministic, Population-Aware Pharmacogenomics Screening." Zenodo. https://doi.org/10.5281/zenodo.20727790

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support