Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ base_model:
|
|
4 |
---
|
5 |
|
6 |
# RNA Inverse Folding
|
7 |
-
|
8 |
|
9 |
#### Setup:
|
10 |
Install [ModelGenerator](https://github.com/genbio-ai/modelgenerator).
|
@@ -30,41 +30,51 @@ Install [ModelGenerator](https://github.com/genbio-ai/modelgenerator).
|
|
30 |
nvidia-smi # this should print the GPUs' details
|
31 |
```
|
32 |
- Execute the following steps from **within** the docker container you just created.
|
|
|
33 |
|
34 |
#### Download model checkpoints:
|
35 |
|
36 |
- Download the `model.ckpt` checkpoint from [here](https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/blob/main/model.ckpt). Place it inside the local directory `${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B`.
|
37 |
-
|
38 |
-
**Alternatively**, you can simply run the following script to do this:
|
39 |
-
```
|
40 |
-
mkdir -p ${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B
|
41 |
-
wget -P ${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/resolve/main/model.ckpt
|
42 |
-
```
|
43 |
|
44 |
-
- Download the gRNAde checkpoint named `gRNAde_ARv1_1state_das.h5` from the [huggingface-hub](https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/blob/main/other_models/gRNAde_ARv1_1state_all.h5) ***or*** the [original source](https://github.com/chaitjo/geometric-rna-design/blob/main/checkpoints/gRNAde_ARv1_1state_all.h5). Place it inside the directory `${MGEN_DATA_DIR}/modelgenerator/other_models/rna_inv_fold
|
45 |
-
|
46 |
-
**Alternatively**, you can
|
47 |
```
|
48 |
-
mkdir -p ${MGEN_DATA_DIR}/modelgenerator/
|
49 |
-
|
|
|
|
|
|
|
|
|
50 |
```
|
51 |
|
52 |
#### Download data:
|
53 |
-
- Download the data preprocessed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). Mainly download these two files: processed.pt.zip
|
54 |
|
55 |
**Alternatively**, you run the following script to do it:
|
56 |
```
|
57 |
mkdir -p ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/
|
58 |
-
|
59 |
-
|
|
|
60 |
```
|
61 |
|
62 |
#### Run inference:
|
63 |
-
- From your terminal, run the script
|
64 |
```
|
65 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
```
|
67 |
-
- **Note:** Multi-GPU inference for inverse folding is not currently supported and will be included in the future.
|
68 |
|
69 |
#### Outputs:
|
70 |
- The evaluation score will be printed on the console.
|
|
|
4 |
---
|
5 |
|
6 |
# RNA Inverse Folding
|
7 |
+
RNA inverse folding is a computational method designed to create RNA sequences that fold into predetermined three-dimensional structures. Our study focuses on generating sequences using the known backbone structure of an RNA, defined by the 3D coordinates of its backbone atoms, without any information of the individual bases. Specifically. we fully finetune the [AIDO.RNA-1.6B](https://huggingface.co/genbio-ai/AIDO.RNA-1.6B) model on the single-state split from [Das _et al._](https://www.nature.com/articles/nmeth.1433) already processed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). We use the same train, validation, and test splits used by their method [gRNAde](https://arxiv.org/abs/2305.14749). Current version of ModelGenerator contains the inference pipeline for RNA inverse folding. Experimental pipeline on other datasets (both training and testing) will be included in the future.
|
8 |
|
9 |
#### Setup:
|
10 |
Install [ModelGenerator](https://github.com/genbio-ai/modelgenerator).
|
|
|
30 |
nvidia-smi # this should print the GPUs' details
|
31 |
```
|
32 |
- Execute the following steps from **within** the docker container you just created.
|
33 |
+
- **Note:** Multi-GPU inference for inverse folding is not currently supported and will be included in the future.
|
34 |
|
35 |
#### Download model checkpoints:
|
36 |
|
37 |
- Download the `model.ckpt` checkpoint from [here](https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/blob/main/model.ckpt). Place it inside the local directory `${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B`.
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
+
- Download the gRNAde checkpoint named `gRNAde_ARv1_1state_das.h5` from the [huggingface-hub](https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/blob/main/other_models/gRNAde_ARv1_1state_all.h5) ***or*** the [original source](https://github.com/chaitjo/geometric-rna-design/blob/main/checkpoints/gRNAde_ARv1_1state_all.h5). Place it inside the directory `${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B/other_models`. Set the environment variable `gRNAde_CKPT_PATH=${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B/other_models/gRNAde_ARv1_1state_das.h5`
|
40 |
+
|
41 |
+
**Alternatively**, you can simply run the following script to do both of these steps:
|
42 |
```
|
43 |
+
mkdir -p ${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B
|
44 |
+
huggingface-cli download genbio-ai/AIDO.RNAIF-1.6B \
|
45 |
+
--repo-type model \
|
46 |
+
--local-dir ${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B
|
47 |
+
# Set the environment variable gRNAde_CKPT_PATH
|
48 |
+
export gRNAde_CKPT_PATH=${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B/other_models/gRNAde_ARv1_1state_das.h5
|
49 |
```
|
50 |
|
51 |
#### Download data:
|
52 |
+
- Download the data preprocessed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). Mainly download these two files: [processed.pt.zip](https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/blob/main/processed.pt.zip) and [processed_df.csv](https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/blob/main/processed_df.csv). Place them inside the directory `${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/`. Please refer to [this link](https://github.com/chaitjo/geometric-rna-design/tree/main?tab=readme-ov-file#downloading-and-preparing-data) for details about the dataset and its preprocessing.
|
53 |
|
54 |
**Alternatively**, you run the following script to do it:
|
55 |
```
|
56 |
mkdir -p ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/
|
57 |
+
huggingface-cli download genbio-ai/rna-inverse-folding \
|
58 |
+
--repo-type dataset \
|
59 |
+
--local-dir ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/
|
60 |
```
|
61 |
|
62 |
#### Run inference:
|
63 |
+
- From your terminal, change directory to `experiments/AIDO.RNA/rna_inverse_folding` folder and run the following script:
|
64 |
```
|
65 |
+
cd modelgenerator/rna_inv_fold/gRNAde_structure_encoder
|
66 |
+
echo "Running inference.."
|
67 |
+
python main.py
|
68 |
+
echo "Extracting structure encoding.."
|
69 |
+
python main_encoder_only.py
|
70 |
+
cd ../../../experiments/AIDO.RNA/rna_inverse_folding/
|
71 |
+
# run inference
|
72 |
+
mgen test --config rna_inv_fold_test.yaml \
|
73 |
+
--trainer.default_root_dir ${MGEN_DATA_DIR}/modelgenerator/logs/rna_inv_fold/ \
|
74 |
+
--ckpt_path ${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B/model.ckpt \
|
75 |
+
--trainer.devices 0, \
|
76 |
+
--data.path ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/structure_encoding/
|
77 |
```
|
|
|
78 |
|
79 |
#### Outputs:
|
80 |
- The evaluation score will be printed on the console.
|