smahbub commited on
Commit
9aa3d5b
·
verified ·
1 Parent(s): 511e744

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -11
README.md CHANGED
@@ -7,29 +7,64 @@ base_model:
7
  We fully finetune the [AIDO.RNA-1.6B](https://huggingface.co/genbio-ai/AIDO.RNA-1.6B) model on the single-state split from [Das _et al._](https://www.nature.com/articles/nmeth.1433) already processed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). We use the same train, validation, and test splits used by their method [gRNAde](https://arxiv.org/abs/2305.14749). Current version of ModelGenerator contains the inference pipeline for RNA inverse folding. Experimental pipeline on other datasets (both training and testing) will be included in the future.
8
 
9
  #### Setup:
10
- Install [Model Generator](https://github.com/genbio-ai/modelgenerator).
11
  - It is **required** to use [docker](https://www.docker.com/101-tutorial/) to run our inverse folding pipeline.
12
- - Please set up a docker image using our provided [Dockerfile](https://github.com/genbio-ai/ModelGenerator/blob/main/Dockerfile) and run the inverse folding inference from within the docker container.
13
-
14
- #### Running inference:
15
-
16
- - Set the environment variable for ModelGenerator's data directory (**Note:** the docker image with our provided [Dockerfile](https://github.com/genbio-ai/ModelGenerator/blob/main/Dockerfile) will already have it set):
17
  ```
18
- export MGEN_DATA_DIR=~/mgen_data # or any other local directory of your choice, if you would like to change it inside [Dockerfile](https://github.com/genbio-ai/ModelGenerator/blob/main/Dockerfile)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ```
 
 
 
20
 
21
  - Download the `model.ckpt` checkpoint from [here](https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/blob/main/model.ckpt). Place it inside the local directory `${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B`.
 
 
 
 
 
 
22
 
23
- - Download the gRNAde checkpoint named `gRNAde_ARv1_1state_das.h5` from [here](https://github.com/chaitjo/geometric-rna-design/blob/main/checkpoints/gRNAde_ARv1_1state_all.h5). Place it inside the directory `${MGEN_DATA_DIR}/modelgenerator/other_models/rna_inv_fold/`.
 
 
 
 
 
 
24
 
25
- - Download the data preprocessed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). Mainly download these two files: [processed.pt.zip](https://drive.google.com/file/d/1gcUUaRxbGZnGMkLdtVwAILWVerVCbu4Y/view) and [processed_df.csv](https://drive.google.com/file/d/1lbdiE1LfWPReo5VnZy0zblvhVl5QhaF4/view). Place them inside the directory `${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/`.
 
26
 
 
 
 
 
 
 
27
 
28
- - From your terminal, change directory to `ModelGenerator/experiments/AIDO.RNA/rna_inverse_folding` and run the script `rna_inverse_folding.sh`:
 
29
  ```
30
- cd experiments/AIDO.RNA/rna_inverse_folding
31
  bash rna_inverse_folding.sh
32
  ```
 
33
 
34
  #### Outputs:
35
  - The evaluation score will be printed on the console.
 
7
  We fully finetune the [AIDO.RNA-1.6B](https://huggingface.co/genbio-ai/AIDO.RNA-1.6B) model on the single-state split from [Das _et al._](https://www.nature.com/articles/nmeth.1433) already processed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). We use the same train, validation, and test splits used by their method [gRNAde](https://arxiv.org/abs/2305.14749). Current version of ModelGenerator contains the inference pipeline for RNA inverse folding. Experimental pipeline on other datasets (both training and testing) will be included in the future.
8
 
9
  #### Setup:
10
+ Install [ModelGenerator](https://github.com/genbio-ai/modelgenerator).
11
  - It is **required** to use [docker](https://www.docker.com/101-tutorial/) to run our inverse folding pipeline.
12
+ - Please set up a docker image using our provided [Dockerfile](https://github.com/genbio-ai/ModelGenerator/blob/main/Dockerfile) and run the inverse folding inference from within the docker container.
13
+ - Here is an example bash script to set up and access a docker container:
 
 
 
14
  ```
15
+ # clone the ModelGenerator repository
16
+ git clone https://github.com/genbio-ai/ModelGenerator.git
17
+ # cd to "ModelGenerator" folder where you should find the "Dockerfile"
18
+ cd ModelGenerator
19
+ # create a docker image
20
+ docker build -t aido .
21
+ # create a local folder as ModelGenerator's data directory
22
+ mkdir -p $HOME/mgen_data
23
+ # run a container
24
+ docker run -d --runtime=nvidia -it -v "$(pwd):/workspace" -v "$HOME/mgen_data:/mgen_data" aido /bin/bash
25
+ # find the container ID
26
+ docker ps # this will print the running containers and their IDs
27
+ # execute the container with ID=<container_id>
28
+ docker exec -it <container_id> /bin/bash # now you should be inside the docker container
29
+ # test if you can access the nvidia GPUs
30
+ nvidia-smi # this should print the GPUs' details
31
  ```
32
+ - Execute the following steps from **within** the docker container you just created.
33
+
34
+ #### Download model checkpoints:
35
 
36
  - Download the `model.ckpt` checkpoint from [here](https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/blob/main/model.ckpt). Place it inside the local directory `${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B`.
37
+
38
+ **Alternatively**, you can simply run the following script to do this:
39
+ ```
40
+ mkdir -p ${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B
41
+ wget -P ${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/resolve/main/model.ckpt
42
+ ```
43
 
44
+ - Download the gRNAde checkpoint named `gRNAde_ARv1_1state_das.h5` from the [huggingface-hub](https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/blob/main/other_models/gRNAde_ARv1_1state_all.h5) ***or*** the [original source](https://github.com/chaitjo/geometric-rna-design/blob/main/checkpoints/gRNAde_ARv1_1state_all.h5). Place it inside the directory `${MGEN_DATA_DIR}/modelgenerator/other_models/rna_inv_fold/`.
45
+
46
+ **Alternatively**, you can do it by simply running the following script:
47
+ ```
48
+ mkdir -p ${MGEN_DATA_DIR}/modelgenerator/other_models/rna_inv_fold/
49
+ wget -P ${MGEN_DATA_DIR}/modelgenerator/other_models/rna_inv_fold/ https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/resolve/main/other_models/gRNAde_ARv1_1state_all.h5
50
+ ```
51
 
52
+ #### Download data:
53
+ - Download the data preprocessed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). Mainly download these two files: processed.pt.zip ([huggingface-hub](https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/blob/main/processed.pt.zip), [original source](https://drive.google.com/file/d/1gcUUaRxbGZnGMkLdtVwAILWVerVCbu4Y/view)) and processed_df.csv ([huggingface-hub](https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/blob/main/processed_df.csv), [original source](https://drive.google.com/file/d/1lbdiE1LfWPReo5VnZy0zblvhVl5QhaF4/view)). Place them inside the directory `${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/`. Please refer to [this link](https://github.com/chaitjo/geometric-rna-design/tree/main?tab=readme-ov-file#downloading-and-preparing-data) for details about the dataset and its preprocessing.
54
 
55
+ **Alternatively**, you run the following script to do it:
56
+ ```
57
+ mkdir -p ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/
58
+ wget -P ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/ https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/resolve/main/processed.pt.zip
59
+ wget -P ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/ https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/resolve/main/processed_df.csv
60
+ ```
61
 
62
+ #### Run inference:
63
+ - From your terminal, run the script `rna_inverse_folding.sh`:
64
  ```
 
65
  bash rna_inverse_folding.sh
66
  ```
67
+ - **Note:** Multi-GPU inference for inverse folding is not currently supported and will be included in the future.
68
 
69
  #### Outputs:
70
  - The evaluation score will be printed on the console.