---
license: cc-by-4.0
---
# Pancancer _TP53_ classifier from H&E resections

This model classifies an H&E-stained digital pathology image as _TP53_ wildtype or mutant. It was trained by Jakub Kaczmarzyk using CLAM.

Inputs: Bag of patches with 128um edge length, embedded with CTransPath.

Output classes: wildtype, mutant

## Data

Diagnostic slides in TCGA (e.g., `DX`) were used to train the model. The whole slide images were tiled into 128x128um patches, and each patch was encoded using CTransPath (this produces 768-dimensional embeddings).

Train, validation, and test splits were stratified by TCGA study and _TP53_ status, and patients did not cross split boundaries.

Samples sizes:
- Train: 8,736 slides (7,076 patients)
- Validation: 1,061 slides (881 patients)
- Test: 1,069 slides (881 patients)

The _TP53_ status for each sample was downloaded from [CBioPortal](https://www.cbioportal.org/results/download?cancer_study_list=laml_tcga_pan_can_atlas_2018%2Cacc_tcga_pan_can_atlas_2018%2Cblca_tcga_pan_can_atlas_2018%2Clgg_tcga_pan_can_atlas_2018%2Cbrca_tcga_pan_can_atlas_2018%2Ccesc_tcga_pan_can_atlas_2018%2Cchol_tcga_pan_can_atlas_2018%2Ccoadread_tcga_pan_can_atlas_2018%2Cdlbc_tcga_pan_can_atlas_2018%2Cesca_tcga_pan_can_atlas_2018%2Cgbm_tcga_pan_can_atlas_2018%2Chnsc_tcga_pan_can_atlas_2018%2Ckich_tcga_pan_can_atlas_2018%2Ckirc_tcga_pan_can_atlas_2018%2Ckirp_tcga_pan_can_atlas_2018%2Clihc_tcga_pan_can_atlas_2018%2Cluad_tcga_pan_can_atlas_2018%2Clusc_tcga_pan_can_atlas_2018%2Cmeso_tcga_pan_can_atlas_2018%2Cov_tcga_pan_can_atlas_2018%2Cpaad_tcga_pan_can_atlas_2018%2Cpcpg_tcga_pan_can_atlas_2018%2Cprad_tcga_pan_can_atlas_2018%2Csarc_tcga_pan_can_atlas_2018%2Cskcm_tcga_pan_can_atlas_2018%2Cstad_tcga_pan_can_atlas_2018%2Ctgct_tcga_pan_can_atlas_2018%2Cthym_tcga_pan_can_atlas_2018%2Cthca_tcga_pan_can_atlas_2018%2Cucs_tcga_pan_can_atlas_2018%2Cucec_tcga_pan_can_atlas_2018%2Cuvm_tcga_pan_can_atlas_2018&tab_index=tab_visualize&profileFilter=mutations&case_set_id=all&Action=Submit&gene_list=TP53%253A%2520MUT&Z_SCORE_THRESHOLD=2.0&RPPA_SCORE_THRESHOLD=2.0&geneset_list=%20&exclude_germline_mutations=true&comparison_subtab=clinical).


TCGA studies with fewer than 100 samples of mutated _TP53_ were excluded from training.

The following TCGA studies were used in training: ACC, BLCA, BRCA, CESC, COADREAD, ESCA, GBM, HNSC, KIRC, KIRP, LGG, LIHC, LUAD, LUSC, OV, PAAD, PCPG, PRAD, SARC, SKCM, STAD, TGCT, THCA, THYM, UCEC.

The following TCGA studies were not used in training: CHOL, UVM, UCS, KICH, MESO, DLBC.

## Reusing this model

To use this model on the command line, see [WSInfer-MIL](https://github.com/kaczmarj/wsinfer-mil).

Alternatively, you may use PyTorch on ONNX to run the model. First, embed 128um x 128um patches using CTransPath. Then pass the bag of embeddings to the model.

```python
import onnxruntime as ort
import numpy as np
embedding = np.ones((1_000, 768), dtype="float32")
ort_sess = ort.InferenceSession("model.onnx")
logits, attention = ort_sess.run(["logits", "attention"], {'input': embedding})
```

## Model performance

The model achieved an AUROC of 0.85 on the full test set.

Here are the AUROC values per TCGA study. NaN values are present wherever there was only a single class present in the ground truth labels.

- ACC: 0.750
- BLCA: 0.597
- BRCA: 0.862
- CESC: 0.562
- COADREAD: 0.742
- ESCA: 0.643
- GBM: 0.792
- HNSC: 0.599
- KIRC: 1.000
- KIRP: nan
- LGG: 0.763
- LIHC: 0.769
- LUAD: 0.842
- LUSC: 0.610
- OV: 0.708
- PAAD: 0.787
- PCPG: nan
- PRAD: 0.657
- SARC: 0.762
- SKCM: 0.722
- STAD: 0.716
- TGCT: nan
- THCA: nan
- THYM: nan
- UCEC: 0.825

# Intended uses

This model is ONLY intended for research purposes.

**This model may not be used for clinical purposes.** This model is distributed without warranties, either express or implied.