ideasbyjin
commited on
Commit
•
67ae5f4
1
Parent(s):
e933533
Add README
Browse files
README.md
CHANGED
@@ -1,6 +1,41 @@
|
|
1 |
---
|
2 |
license: other
|
3 |
widget:
|
4 |
-
- text: "Ḣ"
|
5 |
---
|
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: other
|
3 |
widget:
|
4 |
+
- text: "Ḣ Q V Q [MASK] E"
|
5 |
---
|
6 |
|
7 |
+
## AntiBERTa2 🧬
|
8 |
+
|
9 |
+
AntiBERTa2 is an antibody-specific language model based on the [RoFormer model](https://arxiv.org/abs/2104.09864) - it is pre-trained using masked language modelling.
|
10 |
+
We also provide a multimodal version of AntiBERTa2, AntiBERTa2-CSSP, that has been trained using a contrastive objective, similar to the [CLIP method](https://arxiv.org/abs/2103.00020).
|
11 |
+
Further details on both AntiBERTa2 and AntiBERTa2-CSSP are described in our [paper]() accepted at the NeurIPS MLSB Workshop 2023.
|
12 |
+
|
13 |
+
Both AntiBERTa2 models are only available for non-commercial use. Output antibody sequences (e.g. from infilling via masked language models) can only be used for
|
14 |
+
non-commercial use. For any users seeking commercial use of our model and generated antibodies, please reach out to us at [info@alchemab.com](mailto:info@alchemab.com).
|
15 |
+
|
16 |
+
| Model variant | Parameters | Config |
|
17 |
+
| ------------- | ---------- | ------ |
|
18 |
+
| [AntiBERTa2](https://huggingface.co/alchemab/antiberta2) | 202M | 24L, 12H, 1024d |
|
19 |
+
| [AntiBERTa2-CSSP](https://huggingface.co/alchemab/antiberta2-cssp) | 202M | 24L, 12H, 1024d |
|
20 |
+
|
21 |
+
## Example usage
|
22 |
+
|
23 |
+
```
|
24 |
+
>>> from transformers import (
|
25 |
+
RoFormerForMaskedLM,
|
26 |
+
RoFormerTokenizer,
|
27 |
+
pipeline,
|
28 |
+
RoFormerForSequenceClassification
|
29 |
+
)
|
30 |
+
>>> tokenizer = RoFormerTokenizer.from_pretrained("alchemab/antiberta2")
|
31 |
+
>>> model = RoFormerForMaskedLM.from_pretrained("alchemab/antiberta2")
|
32 |
+
|
33 |
+
>>> filler = pipeline(model=model, tokenizer=tokenizer)
|
34 |
+
>>> filler("Ḣ Q V Q ... C A [MASK] D ... T V S S") # fill in the mask
|
35 |
+
|
36 |
+
>>> new_model = RoFormerForSequenceClassification.from_pretrained(
|
37 |
+
"alchemab/antiberta2") # this will of course raise warnings
|
38 |
+
# that a new linear layer will be added
|
39 |
+
# and randomly initialized
|
40 |
+
|
41 |
+
```
|