genbio-ai
/

AIDO.RNA-1.6B

PyTorch

rnabert

Model card Files Files and versions Community

probablybots commited on 14 days ago

Commit

c3b4f26

•

1 Parent(s): c2ac139

Update README.md

Browse files

Files changed (1) hide show

README.md +37 -47

README.md CHANGED Viewed

@@ -30,80 +30,70 @@ The pre-training data contains 42 million unique ncRNA sequences from RNAcentral
 ## How to Use
-Build any downstream models from this backbone
-### Get RNA sequence embedding
 ```python
-from genbio_finetune.tasks import Embed
-model = Embed.from_config({"model.backbone": "aido_rna_1b600m"}).eval()
-collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
 embedding = model(collated_batch)
 print(embedding.shape)
 print(embedding)
 ```
-### Sequence-level regression
 ```python
-from genbio_finetune.tasks import SequenceRegression
-model = SequenceRegression.from_config({"model.backbone": "aido_rna_1b600m"}).eval()
 collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
 logits = model(collated_batch)
 print(logits)
 ```
-### Sequence-level classification
 ```python
 import torch
-from genbio_finetune.tasks import SequenceClassification
-model = SequenceClassification.from_config({"model.backbone": "aido_rna_1b600m", "model.n_classes": 2}).eval()
 collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
 logits = model(collated_batch)
 print(logits)
 print(torch.argmax(logits, dim=-1))
 ```
-### Token-level classification
 ```python
-import torch
-from genbio_finetune.tasks import TokenClassification
-model = TokenClassification.from_config({"model.backbone": "aido_rna_1b600m", "model.n_classes": 3}).eval()
 collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
 logits = model(collated_batch)
 print(logits)
-print(torch.argmax(logits, dim=-1))
-```
-### Pairwise token-level classification
-@Sazan TODO
-## RNA inverse folding
-@Sazan
-Or use our one-liner CLI to finetune or evaluate any of the above!
-```bash
-mgen fit --model SequenceClassification --model.backbone aido_rna_1b600m --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
-mgen test --model SequenceClassification --model.backbone aido_rna_1b600m --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
 ```
-For more information, visit: [ModelGenerator](https://github.com/genbio-ai/modelgenerator)
 ## Citation
 Please cite AIDO.RNA using the following BibTeX code:
 ```
-@inproceedings{
-zou2024a,
-title={A Large-Scale Foundation Model for {RNA} Function and Structure Prediction},
-author={Shuxian Zou and Tianhua Tao and Sazan Mahbub and Caleb Ellington and Robin Jonathan Algayres and Dian Li and Yonghao Zhuang and Hongyi Wang and Le Song and Eric P. Xing},
-booktitle={NeurIPS 2024 Workshop on AI for New Drug Modalities},
-year={2024},
-url={https://openreview.net/forum?id=Gzo3JMPY8w}
 }
 ```
-## License
-@Hongyi TODO

 ## How to Use
+### Build any downstream models from this backbone with ModelGenerator
+For more information, visit: [Model Generator](https://github.com/genbio-ai/modelgenerator)
+```bash
+mgen fit --model SequenceClassification --model.backbone aido_rna_1b600m --data SequenceClassificationDataModule --data.path <hf_or_local_path_to_your_dataset>
+mgen test --model SequenceClassification --model.backbone aido_rna_1b600m --data SequenceClassificationDataModule --data.path <hf_or_local_path_to_your_dataset>
+```
+### Or use directly in Python
+#### Embedding
 ```python
+from modelgenerator.tasks import Embed
+model = Embed.from_config({"model.backbone": "aido_dna_7b"}).eval()
+collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
 embedding = model(collated_batch)
 print(embedding.shape)
 print(embedding)
 ```
+#### Sequence-level Classification
 ```python
+import torch
+from modelgenerator.tasks import SequenceClassification
+model = SequenceClassification.from_config({"model.backbone": "aido_dna_7b", "model.n_classes": 2}).eval()
 collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
 logits = model(collated_batch)
 print(logits)
+print(torch.argmax(logits, dim=-1))
 ```
+#### Token-level Classification
 ```python
 import torch
+from modelgenerator.tasks import TokenClassification
+model = TokenClassification.from_config({"model.backbone": "aido_dna_7b", "model.n_classes": 3}).eval()
 collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
 logits = model(collated_batch)
 print(logits)
 print(torch.argmax(logits, dim=-1))
 ```
+#### Sequence-level Regression
 ```python
+from modelgenerator.tasks import SequenceRegression
+model = SequenceRegression.from_config({"model.backbone": "aido_dna_7b"}).eval()
 collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
 logits = model(collated_batch)
 print(logits)
+### Get RNA sequence embedding
+```python
+from genbio_finetune.tasks import Embed
+model = Embed.from_config({"model.backbone": "aido_rna_1b600m"}).eval()
+collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
+embedding = model(collated_batch)
+print(embedding.shape)
+print(embedding)
 ```
 ## Citation
 Please cite AIDO.RNA using the following BibTeX code:
 ```
+@misc{zou_large-scale_2024,
+	title = {A Large-Scale Foundation Model for RNA Function and Structure Prediction},
+	url = {https://www.biorxiv.org/content/10.1101/2024.11.28.625345v1},
+	doi = {10.1101/2024.11.28.625345},
+	publisher = {bioRxiv},
+	author = {Zou, Shuxian and Tao, Tianhua and Mahbub, Sazan and Ellington, Caleb N. and Algayres, Robin and Li, Dian and Zhuang, Yonghao and Wang, Hongyi and Song, Le and Xing, Eric P.},
+	year = {2024},
 }
 ```