pavm595 commited on
Commit
bd0ecf6
·
verified ·
1 Parent(s): 71c0fd2

Upload 4 files

Browse files
Files changed (4) hide show
  1. MolE-XGBoost-08.03.2024_14.20.pkl +3 -0
  2. README.md +22 -3
  3. config.yaml +28 -0
  4. model.pth +3 -0
MolE-XGBoost-08.03.2024_14.20.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e66874f9019beab0eb02378893c064d63d34df3482f8f6f0495d144597e972d0
3
+ size 10210090
README.md CHANGED
@@ -1,3 +1,22 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MolE - Antimicrobial Prediction
2
+
3
+ This model uses MolE's pre-trained representation to train XGBoost models to predict the antimicrobial activity of compounds based on their molecular structure.
4
+
5
+ ## Files:
6
+
7
+ - `model.pth` - the pre-trained representation model's weights
8
+ - `config.yaml` - model configuration
9
+ - `MolE-XGBoost-08.03.2024_14.20.pkl` - pretrained XGBoost model
10
+
11
+ ## Usage
12
+
13
+ Not ready yet.
14
+
15
+ ## Publication
16
+ For more information about MolE, and how we use it to predict antimicrobial activity, you can check out the paper in Nature Communications:
17
+ [**Pre-trained molecular representations enable antimicrobial discovery**](https://www.nature.com/articles/s41467-025-58804-4)
18
+
19
+ ## GitHub
20
+
21
+ The code is available here:
22
+ [**Link to GitHub repo**](https://github.com/rolayoalarcon/mole_antimicrobial_potential)
config.yaml ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ batch_size: 1000 # batch size
2
+ warm_up: 10 # warm-up epochs
3
+ epochs: 1000 # total number of epochs
4
+
5
+ load_model: None # resume training
6
+ eval_every_n_epochs: 1 # validation frequency
7
+ save_every_n_epochs: 5 # automatic model saving frequecy
8
+
9
+ fp16_precision: False # float precision 16 (i.e. True/False)
10
+ init_lr: 0.0005 # initial learning rate for Adam
11
+ weight_decay: 1e-5 # weight decay for Adam
12
+ gpu: cuda:0 # training GPU
13
+
14
+ model_type: gin_concat # GNN backbone (i.e., gin/gcn)
15
+ model:
16
+ num_layer: 5 # number of graph conv layers
17
+ emb_dim: 200 # embedding dimension in graph conv layers
18
+ feat_dim: 8000 # output feature dimention
19
+ drop_ratio: 0.0 # dropout ratio
20
+ pool: add # readout pooling (i.e., mean/max/add)
21
+
22
+ dataset:
23
+ num_workers: 50 # dataloader number of workers
24
+ valid_size: 0.1 # ratio of validation data
25
+ data_path: data/pubchem_data/pubchem_100k_random.txt # path of pre-training data
26
+
27
+ loss:
28
+ l: 0.0001 # Lambda parameter
model.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d324644c5f43e7be6734a9cd7a7966f975bfcc113610c13be897d11674defd8
3
+ size 803807667