File size: 2,010 Bytes
3ddaeeb
 
 
 
 
 
 
 
 
 
 
052635e
 
 
 
3ddaeeb
 
 
 
 
 
 
 
 
 
 
 
 
 
7908fcc
 
 
 
 
 
 
 
 
 
 
052635e
3ddaeeb
052635e
3ddaeeb
052635e
3ddaeeb
 
 
 
 
 
052635e
3ddaeeb
 
 
 
 
 
6d189d5
3ddaeeb
 
 
052635e
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
language:
- en
metrics:
- accuracy
- AUC ROC
- precision
- recall
tags:
- biology
- chemistry
- therapeutic science
- drug design
- drug development
- therapeutics
library_name: tdc
license: bsd-2-clause
---

## Dataset description

The CYP P450 genes are involved in the formation and breakdown (metabolism) of various molecules and chemicals within cells. Specifically, CYP3A4 is an important enzyme in the body, mainly found in the liver and in the intestine. It oxidizes small foreign organic molecules (xenobiotics), such as toxins or drugs, so that they can be removed from the body.

## Task description
Binary classification. Given a drug SMILES string, predict CYP3A4 inhibition.

## Dataset statistics
Total: 12,328 drugs

## Pre-requisites
Install the following packages
```
pip install PyTDC
pip install DeepPurpose
pip install git+https://github.com/bp-kelley/descriptastorus
pip install dgl torch torchvision
```
You can also reference the colab notebook [here](https://colab.research.google.com/drive/1CL92SOCBS-eYDL99w8tjSNIG_ySXzMrG?usp=sharing)


## Dataset split
Random split on 70% training, 10% validation, and 20% testing

To load the dataset in TDC, type

```python
from tdc.single_pred import ADME
data = ADME(name = 'CYP3A4_Veith')
```

## Model description
CNN is applying Convolutional Neural Network on SMILES string fingerprint. The model is tuned with 100 runs using the Ax platform.
To load the pre-trained model, type

```python
from tdc import tdc_hf_interface
tdc_hf = tdc_hf_interface("CYP3A4_Veith-CNN")
# load deeppurpose model from this repo
dp_model = tdc_hf.load_deeppurpose('./data')
tdc_hf.predict_deeppurpose(dp_model, ['YOUR SMILES STRING'])
```

## References
* Dataset entry in Therapeutics Data Commons, https://tdcommons.ai/single_pred_tasks/adme/#cyp-p450-3a4-inhibition-veith-et-al
* Veith, Henrike et al. “Comprehensive characterization of cytochrome P450 isozyme selectivity across chemical libraries.” Nature Biotechnology vol. 27,11 (2009): 1050-5.