ShuxianZou commited on
Commit
f5d1e99
1 Parent(s): a7d67c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -12
README.md CHANGED
@@ -1,33 +1,86 @@
1
  # AIDO.RNA 1.6B
2
 
3
- AIDO.RNA is an RNA foundation model trained on 42 million non-coding RNA sequences at single-nucleotide resolution. It achieves state-of-the-art performance on a comprehensive set of tasks, including RNA secondary structure prediction, mRNA-related tasks, RNA function prediction tasks, and RNA inverse folding.
4
 
5
  <img src="https://cdn-uploads.huggingface.co/production/uploads/63008d4bc1e149ceaff724a3/mNqn5SKQFHxSby3E2dosE.png" alt="description" style="width:80%; height:auto;">
6
 
 
 
7
 
8
- ## How to Use
 
9
 
10
- ### Sequence-level Regression
 
11
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
- ### Sequence-level Classification
14
  ```
15
  import torch
16
  from genbio_finetune.tasks import SequenceClassification
17
- from genbio_finetune.models import MLPPoolAdapter
18
- model = SequenceClassification.from_config({"model.backbone": "rnafm",
19
- "model.n_classes": 2,
20
- "model.adapter": MLPPoolAdapter,
21
- })
22
  collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
23
  logits = model(collated_batch)
24
  print(logits)
25
  print(torch.argmax(logits, dim=-1))
26
  ```
27
 
28
- ### Pairwise Token-level Classification
29
- TODO
 
 
 
 
 
 
 
 
30
 
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  ## Citation
33
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # AIDO.RNA 1.6B
2
 
3
+ AIDO.RNA is a 1.6B parameter RNA foundation model trained on 42 million non-coding RNA sequences at single-nucleotide resolution. It achieves state-of-the-art performance on a comprehensive set of tasks, including RNA secondary structure prediction, mRNA-related tasks, RNA function prediction tasks, and RNA inverse folding.
4
 
5
  <img src="https://cdn-uploads.huggingface.co/production/uploads/63008d4bc1e149ceaff724a3/mNqn5SKQFHxSby3E2dosE.png" alt="description" style="width:80%; height:auto;">
6
 
7
+ ## Model architectural details
8
+ TODO
9
 
10
+ ## Pre-training data
11
+ TODO
12
 
13
+ ## Downstream evaluation
14
+ TODO
15
 
16
+ ## How to Use
17
+ Build any downstream models from this backbone
18
+
19
+ ### Get RNA sequence embedding
20
+ ```
21
+ from genbio_finetune.tasks import Embed
22
+ model = Embed.from_config({"model.backbone": "rnafm"})
23
+ collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
24
+ embedding = model(collated_batch)
25
+ print(embedding.shape)
26
+ print(embedding)
27
+ ```
28
 
29
+ ### Sequence-level classification
30
  ```
31
  import torch
32
  from genbio_finetune.tasks import SequenceClassification
33
+ model = SequenceClassification.from_config({"model.backbone": "rnafm", "model.n_classes": 2})
 
 
 
 
34
  collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
35
  logits = model(collated_batch)
36
  print(logits)
37
  print(torch.argmax(logits, dim=-1))
38
  ```
39
 
40
+ ### Token-level classification
41
+ ```
42
+ import torch
43
+ from genbio_finetune.tasks import TokenClassification
44
+ model = TokenClassification.from_config({"model.backbone": "rnafm", "model.n_classes": 3})
45
+ collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
46
+ logits = model(collated_batch)
47
+ print(logits)
48
+ print(torch.argmax(logits, dim=-1))
49
+ ```
50
 
51
 
52
+ ### Pairwise token-level classification
53
+ @Sazan TODO
54
+
55
+
56
+ ### Sequence-level regression
57
+ ```
58
+ from genbio_finetune.tasks import SequenceRegression
59
+ model = SequenceRegression.from_config({"model.backbone": "rnafm"})
60
+ collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
61
+ logits = model(collated_batch)
62
+ print(logits)
63
+ ```
64
+
65
+ Or use our one-liner CLI to finetune or evaluate any of the above!
66
+ ```
67
+ gbft fit --model SequenceClassification --model.backbone rnafm --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
68
+ gbft test --model SequenceClassification --model.backbone rnafm --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
69
+ ```
70
+
71
+ For more information, visit: [Model Generator](https://github.com/genbio-ai/test)
72
+
73
  ## Citation
74
+ Please cite AIDO.RNA using the following BibTeX code:
75
+
76
+ @inproceedings{ellington2024accurate,
77
+ title={Accurate and General {DNA} Representations Emerge from Genome Foundation Models at Scale},
78
+ author={Caleb Ellington, Ning Sun, Nicholas Ho, Tianhua Tao, Sazan Mahbub, Yonghao Zhuang, Hongyi Wang, Eric P. Xing, Le Song},
79
+ booktitle={NeurIPS 2024 Workshop on AI for New Drug Modalities},
80
+ year={2024}
81
+ }
82
+
83
+ ## License
84
+ @Hongyi TODO
85
+
86
+