AmelieSchreiber commited on
Commit
5cfddda
1 Parent(s): ecc320f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -29,7 +29,10 @@ tags:
29
  This model is trained to predict general binding sites of proteins using on the sequence. This is a finetuned version of
30
  `esm2_t6_8M_UR50D`, trained on [this dataset](https://huggingface.co/datasets/AmelieSchreiber/general_binding_sites). The data is
31
  not filtered by family, and thus the model may be overfit to some degree. In the Hugging Face Inference API widget to the right
32
- there are three protein sequence examples. The first is a DNA binding protein, the second and third were obtained using [EvoProtGrad](https://github.com/Amelie-Schreiber/sampling_protein_language_models/blob/main/EvoProtGrad_copy.ipynb)
 
 
 
33
  a Markov Chain Monte Carlo method of (in silico) directed evolution of proteins based on a form of Gibbs sampling. The mutatant-type
34
  protein sequences in theory should have similar binding sites to the wild-type protein sequence, but perhaps with higher binding affinity.
35
  Testing this out on the model, we see the two proteins indeed have the same binding sites, which validates to some degree that the model
 
29
  This model is trained to predict general binding sites of proteins using on the sequence. This is a finetuned version of
30
  `esm2_t6_8M_UR50D`, trained on [this dataset](https://huggingface.co/datasets/AmelieSchreiber/general_binding_sites). The data is
31
  not filtered by family, and thus the model may be overfit to some degree. In the Hugging Face Inference API widget to the right
32
+ there are three protein sequence examples. The first is a DNA binding protein ([see UniProt entry here](https://www.uniprot.org/uniprotkb/D3ZG52/entry)).
33
+ Note there is significant overlap in the predicted binding sites and the binding sites given in UniProt.
34
+
35
+ The second and third were obtained using [EvoProtGrad](https://github.com/Amelie-Schreiber/sampling_protein_language_models/blob/main/EvoProtGrad_copy.ipynb)
36
  a Markov Chain Monte Carlo method of (in silico) directed evolution of proteins based on a form of Gibbs sampling. The mutatant-type
37
  protein sequences in theory should have similar binding sites to the wild-type protein sequence, but perhaps with higher binding affinity.
38
  Testing this out on the model, we see the two proteins indeed have the same binding sites, which validates to some degree that the model