smahbub commited on
Commit
63b37db
·
verified ·
1 Parent(s): 9aa3d5b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -18
README.md CHANGED
@@ -4,7 +4,7 @@ base_model:
4
  ---
5
 
6
  # RNA Inverse Folding
7
- We fully finetune the [AIDO.RNA-1.6B](https://huggingface.co/genbio-ai/AIDO.RNA-1.6B) model on the single-state split from [Das _et al._](https://www.nature.com/articles/nmeth.1433) already processed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). We use the same train, validation, and test splits used by their method [gRNAde](https://arxiv.org/abs/2305.14749). Current version of ModelGenerator contains the inference pipeline for RNA inverse folding. Experimental pipeline on other datasets (both training and testing) will be included in the future.
8
 
9
  #### Setup:
10
  Install [ModelGenerator](https://github.com/genbio-ai/modelgenerator).
@@ -30,41 +30,51 @@ Install [ModelGenerator](https://github.com/genbio-ai/modelgenerator).
30
  nvidia-smi # this should print the GPUs' details
31
  ```
32
  - Execute the following steps from **within** the docker container you just created.
 
33
 
34
  #### Download model checkpoints:
35
 
36
  - Download the `model.ckpt` checkpoint from [here](https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/blob/main/model.ckpt). Place it inside the local directory `${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B`.
37
-
38
- **Alternatively**, you can simply run the following script to do this:
39
- ```
40
- mkdir -p ${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B
41
- wget -P ${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/resolve/main/model.ckpt
42
- ```
43
 
44
- - Download the gRNAde checkpoint named `gRNAde_ARv1_1state_das.h5` from the [huggingface-hub](https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/blob/main/other_models/gRNAde_ARv1_1state_all.h5) ***or*** the [original source](https://github.com/chaitjo/geometric-rna-design/blob/main/checkpoints/gRNAde_ARv1_1state_all.h5). Place it inside the directory `${MGEN_DATA_DIR}/modelgenerator/other_models/rna_inv_fold/`.
45
-
46
- **Alternatively**, you can do it by simply running the following script:
47
  ```
48
- mkdir -p ${MGEN_DATA_DIR}/modelgenerator/other_models/rna_inv_fold/
49
- wget -P ${MGEN_DATA_DIR}/modelgenerator/other_models/rna_inv_fold/ https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/resolve/main/other_models/gRNAde_ARv1_1state_all.h5
 
 
 
 
50
  ```
51
 
52
  #### Download data:
53
- - Download the data preprocessed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). Mainly download these two files: processed.pt.zip ([huggingface-hub](https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/blob/main/processed.pt.zip), [original source](https://drive.google.com/file/d/1gcUUaRxbGZnGMkLdtVwAILWVerVCbu4Y/view)) and processed_df.csv ([huggingface-hub](https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/blob/main/processed_df.csv), [original source](https://drive.google.com/file/d/1lbdiE1LfWPReo5VnZy0zblvhVl5QhaF4/view)). Place them inside the directory `${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/`. Please refer to [this link](https://github.com/chaitjo/geometric-rna-design/tree/main?tab=readme-ov-file#downloading-and-preparing-data) for details about the dataset and its preprocessing.
54
 
55
  **Alternatively**, you run the following script to do it:
56
  ```
57
  mkdir -p ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/
58
- wget -P ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/ https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/resolve/main/processed.pt.zip
59
- wget -P ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/ https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/resolve/main/processed_df.csv
 
60
  ```
61
 
62
  #### Run inference:
63
- - From your terminal, run the script `rna_inverse_folding.sh`:
64
  ```
65
- bash rna_inverse_folding.sh
 
 
 
 
 
 
 
 
 
 
 
66
  ```
67
- - **Note:** Multi-GPU inference for inverse folding is not currently supported and will be included in the future.
68
 
69
  #### Outputs:
70
  - The evaluation score will be printed on the console.
 
4
  ---
5
 
6
  # RNA Inverse Folding
7
+ RNA inverse folding is a computational method designed to create RNA sequences that fold into predetermined three-dimensional structures. Our study focuses on generating sequences using the known backbone structure of an RNA, defined by the 3D coordinates of its backbone atoms, without any information of the individual bases. Specifically. we fully finetune the [AIDO.RNA-1.6B](https://huggingface.co/genbio-ai/AIDO.RNA-1.6B) model on the single-state split from [Das _et al._](https://www.nature.com/articles/nmeth.1433) already processed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). We use the same train, validation, and test splits used by their method [gRNAde](https://arxiv.org/abs/2305.14749). Current version of ModelGenerator contains the inference pipeline for RNA inverse folding. Experimental pipeline on other datasets (both training and testing) will be included in the future.
8
 
9
  #### Setup:
10
  Install [ModelGenerator](https://github.com/genbio-ai/modelgenerator).
 
30
  nvidia-smi # this should print the GPUs' details
31
  ```
32
  - Execute the following steps from **within** the docker container you just created.
33
+ - **Note:** Multi-GPU inference for inverse folding is not currently supported and will be included in the future.
34
 
35
  #### Download model checkpoints:
36
 
37
  - Download the `model.ckpt` checkpoint from [here](https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/blob/main/model.ckpt). Place it inside the local directory `${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B`.
 
 
 
 
 
 
38
 
39
+ - Download the gRNAde checkpoint named `gRNAde_ARv1_1state_das.h5` from the [huggingface-hub](https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/blob/main/other_models/gRNAde_ARv1_1state_all.h5) ***or*** the [original source](https://github.com/chaitjo/geometric-rna-design/blob/main/checkpoints/gRNAde_ARv1_1state_all.h5). Place it inside the directory `${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B/other_models`. Set the environment variable `gRNAde_CKPT_PATH=${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B/other_models/gRNAde_ARv1_1state_das.h5`
40
+
41
+ **Alternatively**, you can simply run the following script to do both of these steps:
42
  ```
43
+ mkdir -p ${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B
44
+ huggingface-cli download genbio-ai/AIDO.RNAIF-1.6B \
45
+ --repo-type model \
46
+ --local-dir ${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B
47
+ # Set the environment variable gRNAde_CKPT_PATH
48
+ export gRNAde_CKPT_PATH=${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B/other_models/gRNAde_ARv1_1state_das.h5
49
  ```
50
 
51
  #### Download data:
52
+ - Download the data preprocessed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). Mainly download these two files: [processed.pt.zip](https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/blob/main/processed.pt.zip) and [processed_df.csv](https://huggingface.co/datasets/genbio-ai/rna-inverse-folding/blob/main/processed_df.csv). Place them inside the directory `${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/`. Please refer to [this link](https://github.com/chaitjo/geometric-rna-design/tree/main?tab=readme-ov-file#downloading-and-preparing-data) for details about the dataset and its preprocessing.
53
 
54
  **Alternatively**, you run the following script to do it:
55
  ```
56
  mkdir -p ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/
57
+ huggingface-cli download genbio-ai/rna-inverse-folding \
58
+ --repo-type dataset \
59
+ --local-dir ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/
60
  ```
61
 
62
  #### Run inference:
63
+ - From your terminal, change directory to `experiments/AIDO.RNA/rna_inverse_folding` folder and run the following script:
64
  ```
65
+ cd modelgenerator/rna_inv_fold/gRNAde_structure_encoder
66
+ echo "Running inference.."
67
+ python main.py
68
+ echo "Extracting structure encoding.."
69
+ python main_encoder_only.py
70
+ cd ../../../experiments/AIDO.RNA/rna_inverse_folding/
71
+ # run inference
72
+ mgen test --config rna_inv_fold_test.yaml \
73
+ --trainer.default_root_dir ${MGEN_DATA_DIR}/modelgenerator/logs/rna_inv_fold/ \
74
+ --ckpt_path ${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B/model.ckpt \
75
+ --trainer.devices 0, \
76
+ --data.path ${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/structure_encoding/
77
  ```
 
78
 
79
  #### Outputs:
80
  - The evaluation score will be printed on the console.