--- language: - en metrics: - accuracy - AUC ROC - precision - recall tags: - biology - chemistry - therapeutic science - drug design - drug development - therapeutics library_name: tdc license: bsd-2-clause --- ## Dataset description The CYP P450 genes are involved in the formation and breakdown (metabolism) of various molecules and chemicals within cells. Specifically, CYP3A4 is an important enzyme in the body, mainly found in the liver and in the intestine. It oxidizes small foreign organic molecules (xenobiotics), such as toxins or drugs, so that they can be removed from the body. ## Task description Binary classification. Given a drug SMILES string, predict CYP3A4 inhibition. ## Dataset statistics Total: 12,328 drugs ## Pre-requisites Install the following packages ``` pip install PyTDC pip install DeepPurpose pip install git+https://github.com/bp-kelley/descriptastorus pip install dgl torch torchvision ``` You can also reference the colab notebook [here](https://colab.research.google.com/drive/1CL92SOCBS-eYDL99w8tjSNIG_ySXzMrG?usp=sharing) ## Dataset split Random split on 70% training, 10% validation, and 20% testing To load the dataset in TDC, type ```python from tdc.single_pred import ADME data = ADME(name = 'CYP3A4_Veith') ``` ## Model description CNN is applying Convolutional Neural Network on SMILES string fingerprint. The model is tuned with 100 runs using the Ax platform. To load the pre-trained model, type ```python from tdc import tdc_hf_interface tdc_hf = tdc_hf_interface("CYP3A4_Veith-CNN") # load deeppurpose model from this repo dp_model = tdc_hf.load_deeppurpose('./data') tdc_hf.predict_deeppurpose(dp_model, ['YOUR SMILES STRING']) ``` ## References * Dataset entry in Therapeutics Data Commons, https://tdcommons.ai/single_pred_tasks/adme/#cyp-p450-3a4-inhibition-veith-et-al * Veith, Henrike et al. “Comprehensive characterization of cytochrome P450 isozyme selectivity across chemical libraries.” Nature Biotechnology vol. 27,11 (2009): 1050-5.