Rocketknight1 HF staff commited on
Commit
122184f
1 Parent(s): 82975b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -50
README.md CHANGED
@@ -5,6 +5,8 @@ license: mit
5
 
6
  ESM-1b ([paper](https://www.pnas.org/content/118/15/e2016239118#:~:text=https%3A//doi.org/10.1073/pnas.2016239118), [repository](https://github.com/facebookresearch/esm)) is a transformer protein language model, trained on protein sequence data without label supervision. The model is pretrained on Uniref50 with an unsupervised masked language modeling (MLM) objective, meaning the model is trained to predict amino acids from the surrounding sequence context. This pretraining objective allows ESM-1b to learn generally useful features which can be transferred to downstream prediction tasks. ESM-1b has been evaluated on a variety of tasks related to protein structure and function, including remote homology detection, secondary structure prediction, contact prediction, and prediction of the effects of mutations on function, producing state-of-the-art results.
7
 
 
 
8
 
9
  ## **Model description**
10
 
@@ -19,56 +21,7 @@ ESM-1b can infer information about the structure and function of proteins withou
19
 
20
  ## **Intended uses & limitations**
21
 
22
- The model can be used for feature extraction, fine-tuned on downstream tasks, or used directly to make inferences about the structure and function of protein sequences.
23
-
24
-
25
- ### **How to use**
26
-
27
- You can use this model with a pipeline for masked language modeling:
28
-
29
-
30
- ```
31
- >>> from transformers import ESMForMaskedLM, ESMTokenizer, pipeline
32
- >>> tokenizer = ESMTokenizer.from_pretrained("facebook/esm-1b", do_lower_case=False)
33
- >>> model = ESMForMaskedLM.from_pretrained("facebook/esm-1b")
34
- >>> unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
35
- >>> unmasker('QERLKSIVRILE<mask>SLGYNIVAT')
36
-
37
- [{'sequence': 'Q E R L K S I V R I L E E S L G Y N I V A T',
38
- 'score': 0.0933581069111824,
39
- 'token': 9,
40
- 'token_str': 'E'},
41
- {'sequence': 'Q E R L K S I V R I L E K S L G Y N I V A T',
42
- 'score': 0.09198431670665741,
43
- 'token': 15,
44
- 'token_str': 'K'},
45
- {'sequence': 'Q E R L K S I V R I L E S S L G Y N I V A T',
46
- 'score': 0.06775771081447601,
47
- 'token': 8,
48
- 'token_str': 'S'},
49
- {'sequence': 'Q E R L K S I V R I L E L S L G Y N I V A T',
50
- 'score': 0.0661069005727768,
51
- 'token': 4,
52
- 'token_str': 'L'},
53
- {'sequence': 'Q E R L K S I V R I L E R S L G Y N I V A T',
54
- 'score': 0.06330915540456772,
55
- 'token': 10,
56
- 'token_str': 'R'}]
57
- ```
58
-
59
-
60
- Here is how to use this model to get the features of a given protein sequence in PyTorch:
61
-
62
-
63
- ```
64
- from transformers import ESMForMaskedLM, ESMTokenizer
65
- tokenizer = ESMTokenizer.from_pretrained("facebook/esm-1b", do_lower_case=False )
66
- model = ESMForMaskedLM.from_pretrained("facebook/esm-1b")
67
- sequence_Example = "QERLKSIVRILE"
68
- encoded_input = tokenizer(sequence_Example, return_tensors='pt')
69
- output = model(**encoded_input)
70
- ```
71
-
72
 
73
 
74
  ## **Training data**
 
5
 
6
  ESM-1b ([paper](https://www.pnas.org/content/118/15/e2016239118#:~:text=https%3A//doi.org/10.1073/pnas.2016239118), [repository](https://github.com/facebookresearch/esm)) is a transformer protein language model, trained on protein sequence data without label supervision. The model is pretrained on Uniref50 with an unsupervised masked language modeling (MLM) objective, meaning the model is trained to predict amino acids from the surrounding sequence context. This pretraining objective allows ESM-1b to learn generally useful features which can be transferred to downstream prediction tasks. ESM-1b has been evaluated on a variety of tasks related to protein structure and function, including remote homology detection, secondary structure prediction, contact prediction, and prediction of the effects of mutations on function, producing state-of-the-art results.
7
 
8
+ **Important note**: ESM-2 is now available in a range of checkpoint sizes. For most tasks, ESM-2 performance will be superior to ESM-1 and ESM-1b, and so we recommend using it instead unless your goal is explicitly to compare against ESM-1b. The ESM-2 checkpoint closest in size to ESM-1b is [esm2_t33_650M_UR50D](https://huggingface.co/facebook/esm2_t33_650M_UR50D).
9
+
10
 
11
  ## **Model description**
12
 
 
21
 
22
  ## **Intended uses & limitations**
23
 
24
+ The model can be used for feature extraction, fine-tuned on downstream tasks, or used directly to make inferences about the structure and function of protein sequences, like any other masked language model. For full examples, please see [our notebook on fine-tuning protein models](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/protein_language_modeling.ipynb)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
 
27
  ## **Training data**