DPYD Variant Pathogenicity Classifier

Trained classifier for DPYD variant functional classification (no_function / decreased_function / normal_function) built by Anukriti AI.

Versions

  • v1: Baseline — 273 training rows, 4 decreased_function examples, F1=0.0 for decreased_function
  • v2: +Offer 2014 + PharmVar data, SMOTE, domain features — 392 rows, 23 decreased_function examples, F1=0.66 (XGB)
  • v3: +AlphaMissense + CADD + VEP annotations on training set — AM importance 0→0.01, sentinel gap closed
  • v4: protein-only model — drop clnsig_norm/activity_pct/AF, keep AM/CADD/SIFT/PolyPhen/domain/aa_position/vep_consequence. 13/25 Scaria variants changed class (no_function→decreased_function). CV accuracy 0.91→0.73 (expected — label proxy removed). Best model: lgbm (decreased_function F1=0.25).

Two-tier inference note

The two tier-2 options for novel population-discovery variants (no ClinVar significance, no measured activity — e.g. the Scaria 2025 Indian cohort) are complementary, not competing:

  • The protein-only classifier (v4) is preferred when class discrimination between decreased_function and no_function is required. Unlike v1–v3, v4 does not default novel variants to no_function (it reclassified 13/25 Scaria variants), at the cost of in-distribution accuracy it does not need on this tail.
  • The raw AlphaMissense score is sufficient when the clinical question is simply pathogenicity vs. benign.

For variants that do carry clinical labels or measured activity, the v2/v3 mixed-feature classifier (decreased_function F1=0.66) remains the better tool. CPIC-assigned variants are handled by the deterministic engine, not any classifier. See validation paper.

Ground truth sources

PharmVar activity scores, Offer et al. 2014, GeT-RM 2024 (Gaedigk et al.), ClinVar

Citation

Abhimanyu R B et al. "Deterministic, Population-Aware Pharmacogenomics Screening." Zenodo. https://doi.org/10.5281/zenodo.20727790

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support