ZhiyuanChen commited on
Commit
048e9ac
·
verified ·
1 Parent(s): d7f6169

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -32
README.md CHANGED
@@ -11,6 +11,19 @@ base_model: multimolecule/ernierna
11
  pipeline_tag: fill-mask
12
  mask_token: "<mask>"
13
  widget:
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  - example_title: "microRNA-21"
15
  text: "UAGC<mask>UAUCAGACUGAUGUUGA"
16
  output:
@@ -48,7 +61,7 @@ ERNIE-RNA is a [bert](https://huggingface.co/google-bert/bert-base-uncased)-styl
48
  ### Variations
49
 
50
  - **[`multimolecule/ernierna`](https://huggingface.co/multimolecule/ernierna)**: The ERNIE-RNA model pre-trained on non-coding RNA sequences.
51
- - **[`multimolecule/ernierna.ss`](https://huggingface.co/multimolecule/ernierna.ss)**: The ERNIE-RNA model fine-tuned on RNA secondary structure prediction.
52
 
53
  ### Model Specification
54
 
@@ -63,7 +76,7 @@ ERNIE-RNA is a [bert](https://huggingface.co/google-bert/bert-base-uncased)-styl
63
  - **Paper**: [ERNIE-RNA: An RNA Language Model with Structure-enhanced Representations](https://doi.org/10.1101/2024.03.17.585376)
64
  - **Developed by**: Weijie Yin, Zhaoyu Zhang, Liang He, Rui Jiang, Shuo Zhang, Gan Liu, Xuegong Zhang, Tao Qin, Zhen Xie
65
  - **Model type**: [BERT](https://huggingface.co/google-bert/bert-base-uncased) - [ERNIE](https://huggingface.co/nghuyong/ernie-3.0-base-zh)
66
- - **Original Repository**: [https://github.com/Bruce-ywj/ERNIE-RNA](https://github.com/Bruce-ywj/ERNIE-RNA)
67
 
68
  ## Usage
69
 
@@ -80,29 +93,29 @@ You can use this model directly with a pipeline for masked language modeling:
80
  ```python
81
  >>> import multimolecule # you must import multimolecule to register models
82
  >>> from transformers import pipeline
83
- >>> unmasker = pipeline('fill-mask', model='multimolecule/ernierna.ss')
84
- >>> unmasker("uagc<mask>uaucagacugauguuga")
85
 
86
- [{'score': 0.20929744839668274,
87
- 'token': 9,
88
- 'token_str': 'U',
89
- 'sequence': 'U A G C U U A U C A G A C U G A U G U U G A'},
90
- {'score': 0.1741773933172226,
91
- 'token': 7,
92
- 'token_str': 'C',
93
- 'sequence': 'U A G C C U A U C A G A C U G A U G U U G A'},
94
- {'score': 0.16430608928203583,
95
  'token': 8,
96
  'token_str': 'G',
97
- 'sequence': 'U A G C G U A U C A G A C U G A U G U U G A'},
98
- {'score': 0.1348584145307541,
 
 
 
 
99
  'token': 6,
100
  'token_str': 'A',
101
- 'sequence': 'U A G C A U A U C A G A C U G A U G U U G A'},
102
- {'score': 0.11933524906635284,
 
 
 
 
103
  'token': 21,
104
  'token_str': '.',
105
- 'sequence': 'U A G C. U A U C A G A C U G A U G U U G A'}]
106
  ```
107
 
108
  ### Downstream Use
@@ -115,11 +128,11 @@ Here is how to use this model to get the features of a given sequence in PyTorch
115
  from multimolecule import RnaTokenizer, ErnieRnaModel
116
 
117
 
118
- tokenizer = RnaTokenizer.from_pretrained('multimolecule/ernierna.ss')
119
- model = ErnieRnaModel.from_pretrained('multimolecule/ernierna.ss')
120
 
121
  text = "UAGCUUAUCAGACUGAUGUUGA"
122
- input = tokenizer(text, return_tensors='pt')
123
 
124
  output = model(**input)
125
  ```
@@ -135,17 +148,17 @@ import torch
135
  from multimolecule import RnaTokenizer, ErnieRnaForSequencePrediction
136
 
137
 
138
- tokenizer = RnaTokenizer.from_pretrained('multimolecule/ernierna.ss')
139
- model = ErnieRnaForSequencePrediction.from_pretrained('multimolecule/ernierna.ss')
140
 
141
  text = "UAGCUUAUCAGACUGAUGUUGA"
142
- input = tokenizer(text, return_tensors='pt')
143
  label = torch.tensor([1])
144
 
145
  output = model(**input, labels=label)
146
  ```
147
 
148
- #### Nucleotide Classification / Regression
149
 
150
  **Note**: This model is not fine-tuned for any specific task. You will need to fine-tune the model on a downstream task to use it for nucleotide classification or regression.
151
 
@@ -153,14 +166,14 @@ Here is how to use this model as backbone to fine-tune for a nucleotide-level ta
153
 
154
  ```python
155
  import torch
156
- from multimolecule import RnaTokenizer, ErnieRnaForNucleotidePrediction
157
 
158
 
159
- tokenizer = RnaTokenizer.from_pretrained('multimolecule/ernierna.ss')
160
- model = ErnieRnaForNucleotidePrediction.from_pretrained('multimolecule/ernierna.ss')
161
 
162
  text = "UAGCUUAUCAGACUGAUGUUGA"
163
- input = tokenizer(text, return_tensors='pt')
164
  label = torch.randint(2, (len(text), ))
165
 
166
  output = model(**input, labels=label)
@@ -177,11 +190,11 @@ import torch
177
  from multimolecule import RnaTokenizer, ErnieRnaForContactPrediction
178
 
179
 
180
- tokenizer = RnaTokenizer.from_pretrained('multimolecule/ernierna.ss')
181
- model = ErnieRnaForContactPrediction.from_pretrained('multimolecule/ernierna.ss')
182
 
183
  text = "UAGCUUAUCAGACUGAUGUUGA"
184
- input = tokenizer(text, return_tensors='pt')
185
  label = torch.randint(2, (len(text), len(text)))
186
 
187
  output = model(**input, labels=label)
 
11
  pipeline_tag: fill-mask
12
  mask_token: "<mask>"
13
  widget:
14
+ - example_title: "HIV-1"
15
+ text: "GGUC<mask>CUCUGGUUAGACCAGAUCUGAGCCU"
16
+ output:
17
+ - label: "G"
18
+ score: 0.2066272348165512
19
+ - label: "U"
20
+ score: 0.1811930239200592
21
+ - label: "A"
22
+ score: 0.17954225838184357
23
+ - label: "-"
24
+ score: 0.12186982482671738
25
+ - label: "."
26
+ score: 0.10200861096382141
27
  - example_title: "microRNA-21"
28
  text: "UAGC<mask>UAUCAGACUGAUGUUGA"
29
  output:
 
61
  ### Variations
62
 
63
  - **[`multimolecule/ernierna`](https://huggingface.co/multimolecule/ernierna)**: The ERNIE-RNA model pre-trained on non-coding RNA sequences.
64
+ - **[`multimolecule/ernierna-ss`](https://huggingface.co/multimolecule/ernierna-ss)**: The ERNIE-RNA model fine-tuned on RNA secondary structure prediction.
65
 
66
  ### Model Specification
67
 
 
76
  - **Paper**: [ERNIE-RNA: An RNA Language Model with Structure-enhanced Representations](https://doi.org/10.1101/2024.03.17.585376)
77
  - **Developed by**: Weijie Yin, Zhaoyu Zhang, Liang He, Rui Jiang, Shuo Zhang, Gan Liu, Xuegong Zhang, Tao Qin, Zhen Xie
78
  - **Model type**: [BERT](https://huggingface.co/google-bert/bert-base-uncased) - [ERNIE](https://huggingface.co/nghuyong/ernie-3.0-base-zh)
79
+ - **Original Repository**: [Bruce-ywj/ERNIE-RNA](https://github.com/Bruce-ywj/ERNIE-RNA)
80
 
81
  ## Usage
82
 
 
93
  ```python
94
  >>> import multimolecule # you must import multimolecule to register models
95
  >>> from transformers import pipeline
96
+ >>> unmasker = pipeline("fill-mask", model="multimolecule/ernierna-ss")
97
+ >>> unmasker("gguc<mask>cucugguuagaccagaucugagccu")
98
 
99
+ [{'score': 0.2066272348165512,
 
 
 
 
 
 
 
 
100
  'token': 8,
101
  'token_str': 'G',
102
+ 'sequence': 'G G U C G C U C U G G U U A G A C C A G A U C U G A G C C U'},
103
+ {'score': 0.1811930239200592,
104
+ 'token': 9,
105
+ 'token_str': 'U',
106
+ 'sequence': 'G G U C U C U C U G G U U A G A C C A G A U C U G A G C C U'},
107
+ {'score': 0.17954225838184357,
108
  'token': 6,
109
  'token_str': 'A',
110
+ 'sequence': 'G G U C A C U C U G G U U A G A C C A G A U C U G A G C C U'},
111
+ {'score': 0.12186982482671738,
112
+ 'token': 24,
113
+ 'token_str': '-',
114
+ 'sequence': 'G G U C - C U C U G G U U A G A C C A G A U C U G A G C C U'},
115
+ {'score': 0.10200861096382141,
116
  'token': 21,
117
  'token_str': '.',
118
+ 'sequence': 'G G U C. C U C U G G U U A G A C C A G A U C U G A G C C U'}]
119
  ```
120
 
121
  ### Downstream Use
 
128
  from multimolecule import RnaTokenizer, ErnieRnaModel
129
 
130
 
131
+ tokenizer = RnaTokenizer.from_pretrained("multimolecule/ernierna-ss")
132
+ model = ErnieRnaModel.from_pretrained("multimolecule/ernierna-ss")
133
 
134
  text = "UAGCUUAUCAGACUGAUGUUGA"
135
+ input = tokenizer(text, return_tensors="pt")
136
 
137
  output = model(**input)
138
  ```
 
148
  from multimolecule import RnaTokenizer, ErnieRnaForSequencePrediction
149
 
150
 
151
+ tokenizer = RnaTokenizer.from_pretrained("multimolecule/ernierna-ss")
152
+ model = ErnieRnaForSequencePrediction.from_pretrained("multimolecule/ernierna-ss")
153
 
154
  text = "UAGCUUAUCAGACUGAUGUUGA"
155
+ input = tokenizer(text, return_tensors="pt")
156
  label = torch.tensor([1])
157
 
158
  output = model(**input, labels=label)
159
  ```
160
 
161
+ #### Token Classification / Regression
162
 
163
  **Note**: This model is not fine-tuned for any specific task. You will need to fine-tune the model on a downstream task to use it for nucleotide classification or regression.
164
 
 
166
 
167
  ```python
168
  import torch
169
+ from multimolecule import RnaTokenizer, ErnieRnaForTokenPrediction
170
 
171
 
172
+ tokenizer = RnaTokenizer.from_pretrained("multimolecule/ernierna-ss")
173
+ model = ErnieRnaForTokenPrediction.from_pretrained("multimolecule/ernierna-ss")
174
 
175
  text = "UAGCUUAUCAGACUGAUGUUGA"
176
+ input = tokenizer(text, return_tensors="pt")
177
  label = torch.randint(2, (len(text), ))
178
 
179
  output = model(**input, labels=label)
 
190
  from multimolecule import RnaTokenizer, ErnieRnaForContactPrediction
191
 
192
 
193
+ tokenizer = RnaTokenizer.from_pretrained("multimolecule/ernierna-ss")
194
+ model = ErnieRnaForContactPrediction.from_pretrained("multimolecule/ernierna-ss")
195
 
196
  text = "UAGCUUAUCAGACUGAUGUUGA"
197
+ input = tokenizer(text, return_tensors="pt")
198
  label = torch.randint(2, (len(text), len(text)))
199
 
200
  output = model(**input, labels=label)