--- license: cc-by-4.0 --- # Pancancer _TP53_ classifier from H&E resections This model classifies an H&E-stained digital pathology image as _TP53_ wildtype or mutant. It was trained by Jakub Kaczmarzyk using CLAM. Inputs: Bag of patches with 128um edge length, embedded with CTransPath. Output classes: wildtype, mutant ## Data Diagnostic slides in TCGA (e.g., `DX`) were used to train the model. The whole slide images were tiled into 128x128um patches, and each patch was encoded using CTransPath (this produces 768-dimensional embeddings). Train, validation, and test splits were stratified by TCGA study and _TP53_ status, and patients did not cross split boundaries. Samples sizes: - Train: 8,736 slides (7,076 patients) - Validation: 1,061 slides (881 patients) - Test: 1,069 slides (881 patients) The _TP53_ status for each sample was downloaded from [CBioPortal](https://www.cbioportal.org/results/download?cancer_study_list=laml_tcga_pan_can_atlas_2018%2Cacc_tcga_pan_can_atlas_2018%2Cblca_tcga_pan_can_atlas_2018%2Clgg_tcga_pan_can_atlas_2018%2Cbrca_tcga_pan_can_atlas_2018%2Ccesc_tcga_pan_can_atlas_2018%2Cchol_tcga_pan_can_atlas_2018%2Ccoadread_tcga_pan_can_atlas_2018%2Cdlbc_tcga_pan_can_atlas_2018%2Cesca_tcga_pan_can_atlas_2018%2Cgbm_tcga_pan_can_atlas_2018%2Chnsc_tcga_pan_can_atlas_2018%2Ckich_tcga_pan_can_atlas_2018%2Ckirc_tcga_pan_can_atlas_2018%2Ckirp_tcga_pan_can_atlas_2018%2Clihc_tcga_pan_can_atlas_2018%2Cluad_tcga_pan_can_atlas_2018%2Clusc_tcga_pan_can_atlas_2018%2Cmeso_tcga_pan_can_atlas_2018%2Cov_tcga_pan_can_atlas_2018%2Cpaad_tcga_pan_can_atlas_2018%2Cpcpg_tcga_pan_can_atlas_2018%2Cprad_tcga_pan_can_atlas_2018%2Csarc_tcga_pan_can_atlas_2018%2Cskcm_tcga_pan_can_atlas_2018%2Cstad_tcga_pan_can_atlas_2018%2Ctgct_tcga_pan_can_atlas_2018%2Cthym_tcga_pan_can_atlas_2018%2Cthca_tcga_pan_can_atlas_2018%2Cucs_tcga_pan_can_atlas_2018%2Cucec_tcga_pan_can_atlas_2018%2Cuvm_tcga_pan_can_atlas_2018&tab_index=tab_visualize&profileFilter=mutations&case_set_id=all&Action=Submit&gene_list=TP53%253A%2520MUT&Z_SCORE_THRESHOLD=2.0&RPPA_SCORE_THRESHOLD=2.0&geneset_list=%20&exclude_germline_mutations=true&comparison_subtab=clinical). TCGA studies with fewer than 100 samples of mutated _TP53_ were excluded from training. The following TCGA studies were used in training: ACC, BLCA, BRCA, CESC, COADREAD, ESCA, GBM, HNSC, KIRC, KIRP, LGG, LIHC, LUAD, LUSC, OV, PAAD, PCPG, PRAD, SARC, SKCM, STAD, TGCT, THCA, THYM, UCEC. The following TCGA studies were not used in training: CHOL, UVM, UCS, KICH, MESO, DLBC. ## Reusing this model To use this model on the command line, see [WSInfer-MIL](https://github.com/kaczmarj/wsinfer-mil). Alternatively, you may use PyTorch on ONNX to run the model. First, embed 128um x 128um patches using CTransPath. Then pass the bag of embeddings to the model. ```python import onnxruntime as ort import numpy as np embedding = np.ones((1_000, 768), dtype="float32") ort_sess = ort.InferenceSession("model.onnx") logits, attention = ort_sess.run(["logits", "attention"], {'input': embedding}) ``` ## Model performance The model achieved an AUROC of 0.85 on the full test set. Here are the AUROC values per TCGA study. NaN values are present wherever there was only a single class present in the ground truth labels. - ACC: 0.750 - BLCA: 0.597 - BRCA: 0.862 - CESC: 0.562 - COADREAD: 0.742 - ESCA: 0.643 - GBM: 0.792 - HNSC: 0.599 - KIRC: 1.000 - KIRP: nan - LGG: 0.763 - LIHC: 0.769 - LUAD: 0.842 - LUSC: 0.610 - OV: 0.708 - PAAD: 0.787 - PCPG: nan - PRAD: 0.657 - SARC: 0.762 - SKCM: 0.722 - STAD: 0.716 - TGCT: nan - THCA: nan - THYM: nan - UCEC: 0.825 # Intended uses This model is ONLY intended for research purposes. **This model may not be used for clinical purposes.** This model is distributed without warranties, either express or implied.