Update README.md
Browse files
README.md
CHANGED
@@ -7,29 +7,64 @@ base_model:
|
|
7 |
We fully finetune the [AIDO.RNA-1.6B](https://huggingface.co/genbio-ai/AIDO.RNA-1.6B) model on the single-state split from [Das _et al._](https://www.nature.com/articles/nmeth.1433) already processed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). We use the same train, validation, and test splits used by their method [gRNAde](https://arxiv.org/abs/2305.14749). Current version of ModelGenerator contains the inference pipeline for RNA inverse folding. Experimental pipeline on other datasets (both training and testing) will be included in the future.
|
8 |
|
9 |
#### Setup:
|
10 |
-
Install [
|
11 |
- It is **required** to use [docker](https://www.docker.com/101-tutorial/) to run our inverse folding pipeline.
|
12 |
-
- Please set up a docker image using our provided [Dockerfile](https://github.com/genbio-ai/ModelGenerator/blob/main/Dockerfile) and run the inverse folding inference from within the docker container.
|
13 |
-
|
14 |
-
#### Running inference:
|
15 |
-
|
16 |
-
- Set the environment variable for ModelGenerator's data directory (**Note:** the docker image with our provided [Dockerfile](https://github.com/genbio-ai/ModelGenerator/blob/main/Dockerfile) will already have it set):
|
17 |
```
|
18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
```
|
|
|
|
|
|
|
20 |
|
21 |
- Download the `model.ckpt` checkpoint from [here](https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/blob/main/model.ckpt). Place it inside the local directory `${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B`.
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
-
- Download the gRNAde checkpoint named `gRNAde_ARv1_1state_das.h5` from [
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
|
25 |
-
|
|
|
26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
-
|
|
|
29 |
```
|
30 |
-
cd experiments/AIDO.RNA/rna_inverse_folding
|
31 |
bash rna_inverse_folding.sh
|
32 |
```
|
|
|
33 |
|
34 |
#### Outputs:
|
35 |
- The evaluation score will be printed on the console.
|
|
|
7 |
We fully finetune the [AIDO.RNA-1.6B](https://huggingface.co/genbio-ai/AIDO.RNA-1.6B) model on the single-state split from [Das _et al._](https://www.nature.com/articles/nmeth.1433) already processed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). We use the same train, validation, and test splits used by their method [gRNAde](https://arxiv.org/abs/2305.14749). Current version of ModelGenerator contains the inference pipeline for RNA inverse folding. Experimental pipeline on other datasets (both training and testing) will be included in the future.
|
8 |
|
9 |
#### Setup:
|
10 |
+
Install [ModelGenerator](https://github.com/genbio-ai/modelgenerator).
|
11 |
- It is **required** to use [docker](https://www.docker.com/101-tutorial/) to run our inverse folding pipeline.
|
12 |
+
- Please set up a docker image using our provided [Dockerfile](https://github.com/genbio-ai/ModelGenerator/blob/main/Dockerfile) and run the inverse folding inference from within the docker container.
|
13 |
+
- Here is an example bash script to set up and access a docker container:
|
|
|
|
|
|
|
14 |
```
|
15 |
+
# clone the ModelGenerator repository
|
16 |
+
git clone https://github.com/genbio-ai/ModelGenerator.git
|
17 |
+
# cd to "ModelGenerator" folder where you should find the "Dockerfile"
|
18 |
+
cd ModelGenerator
|
19 |
+
# create a docker image
|
20 |
+
docker build -t aido .
|
21 |
+
# create a local folder as ModelGenerator's data directory
|
22 |
+
mkdir -p $HOME/mgen_data
|
23 |
+
# run a container
|
24 |
+
docker run -d --runtime=nvidia -it -v "$(pwd):/workspace" -v "$HOME/mgen_data:/mgen_data" aido /bin/bash
|
25 |
+
# find the container ID
|
26 |
+
docker ps # this will print the running containers and their IDs
|
27 |
+
# execute the container with ID=<container_id>
|
28 |
+
docker exec -it <container_id> /bin/bash # now you should be inside the docker container
|
29 |
+
# test if you can access the nvidia GPUs
|
30 |
+
nvidia-smi # this should print the GPUs' details
|
31 |
```
|
32 |
+
- Execute the following steps from **within** the docker container you just created.
|
33 |
+
|
34 |
+
#### Download model checkpoints:
|
35 |
|
36 |
- Download the `model.ckpt` checkpoint from [here](https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/blob/main/model.ckpt). Place it inside the local directory `${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B`.
|
37 |
+
|
38 |
+
**Alternatively**, you can simply run the following script to do this:
|
39 |
+
```
|
40 |
+
mkdir -p ${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B
|
41 |
+
wget -P ${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/resolve/main/model.ckpt
|
42 |
+
```
|
43 |
|
44 |
+
- Download the gRNAde checkpoint named `gRNAde_ARv1_1state_das.h5` from the [huggingface-hub](https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/blob/main/other_models/gRNAde_ARv1_1state_all.h5) ***or*** the [original source](https://github.com/chaitjo/geometric-rna-design/blob/main/checkpoints/gRNAde_ARv1_1state_all.h5). Place it inside the directory `${MGEN_DATA_DIR}/modelgenerator/other_models/rna_inv_fold/`.
|
45 |
+
|
46 |
+
**Alternatively**, you can do it by simply running the following script:
|
47 |
+
```
|
48 |
+
mkdir -p ${MGEN_DATA_DIR}/modelgenerator/other_models/rna_inv_fold/
|
49 |
+
wget -P ${MGEN_DATA_DIR}/modelgenerator/other_models/rna_inv_fold/ https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/resolve/main/other_models/gRNAde_ARv1_1state_all.h5
|
50 |
+
```
|
51 |
|
52 |
+
#### Download data:
|
53 |
+
- Download the data preprocessed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). Mainly download these two files: processed.pt.zip ([huggingface-hub](https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/blob/main/processed.pt.zip), [original source](https://drive.google.com/file/d/1gcUUaRxbGZnGMkLdtVwAILWVerVCbu4Y/view)) and processed_df.csv ([huggingface-hub](https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/blob/main/processed_df.csv), [original source](https://drive.google.com/file/d/1lbdiE1LfWPReo5VnZy0zblvhVl5QhaF4/view)). Place them inside the directory `${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/`. Please refer to [this link](https://github.com/chaitjo/geometric-rna-design/tree/main?tab=readme-ov-file#downloading-and-preparing-data) for details about the dataset and its preprocessing.
|
54 |
|
55 |
+
**Alternatively**, you run the following script to do it:
|
56 |
+
```
|
57 |
+
mkdir -p ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/
|
58 |
+
wget -P ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/ https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/resolve/main/processed.pt.zip
|
59 |
+
wget -P ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/ https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/resolve/main/processed_df.csv
|
60 |
+
```
|
61 |
|
62 |
+
#### Run inference:
|
63 |
+
- From your terminal, run the script `rna_inverse_folding.sh`:
|
64 |
```
|
|
|
65 |
bash rna_inverse_folding.sh
|
66 |
```
|
67 |
+
- **Note:** Multi-GPU inference for inverse folding is not currently supported and will be included in the future.
|
68 |
|
69 |
#### Outputs:
|
70 |
- The evaluation score will be printed on the console.
|