tdc/CYP3A4_Veith-CNN · Hugging Face

Dataset description

The CYP P450 genes are involved in the formation and breakdown (metabolism) of various molecules and chemicals within cells. Specifically, CYP3A4 is an important enzyme in the body, mainly found in the liver and in the intestine. It oxidizes small foreign organic molecules (xenobiotics), such as toxins or drugs, so that they can be removed from the body.

Task description

Binary classification. Given a drug SMILES string, predict CYP3A4 inhibition.

Dataset statistics

Total: 12,328 drugs

Pre-requisites

Install the following packages

pip install PyTDC
pip install DeepPurpose
pip install git+https://github.com/bp-kelley/descriptastorus
pip install dgl torch torchvision

You can also reference the colab notebook here

Dataset split

Random split on 70% training, 10% validation, and 20% testing

To load the dataset in TDC, type

from tdc.single_pred import ADME
data = ADME(name = 'CYP3A4_Veith')

Model description

CNN is applying Convolutional Neural Network on SMILES string fingerprint. The model is tuned with 100 runs using the Ax platform. To load the pre-trained model, type

from tdc import tdc_hf_interface
tdc_hf = tdc_hf_interface("CYP3A4_Veith-CNN")
# load deeppurpose model from this repo
dp_model = tdc_hf.load_deeppurpose('./data')
tdc_hf.predict_deeppurpose(dp_model, ['YOUR SMILES STRING'])

References

Dataset entry in Therapeutics Data Commons, https://tdcommons.ai/single_pred_tasks/adme/#cyp-p450-3a4-inhibition-veith-et-al
Veith, Henrike et al. “Comprehensive characterization of cytochrome P450 isozyme selectivity across chemical libraries.” Nature Biotechnology vol. 27,11 (2009): 1050-5.