NaturalGradient
commited on
Commit
•
8b0ebf4
1
Parent(s):
9d2a2a2
Update README.md
Browse files- BFN_overview.png +0 -0
- README.md +14 -0
- cath_s40_proteins.png +0 -0
BFN_overview.png
ADDED
README.md
CHANGED
@@ -1,3 +1,17 @@
|
|
1 |
---
|
2 |
license: cc-by-4.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: cc-by-4.0
|
3 |
---
|
4 |
+
|
5 |
+
# Protein Sequence Modelling with Bayesian Flow Networks
|
6 |
+
|
7 |
+
Welcome to the model weights for the paper "Protein Sequence Modelling with Bayesian Flow Networks". Using the [code on our GitHub page](https://github.com/instadeepai/protein-sequence-bfn), you can sample from our trained models ProtBFN, for general proteins, and AbBFN, for antibody VH chains.
|
8 |
+
|
9 |
+
[Bayesian Flow Networks](https://arxiv.org/abs/2308.07037) are a new approach to generative modelling, and can be viewed as an extension of diffusion models to the parameter space of probability distributions. They define a continuous-time process that maps between a naive prior distribution and a psuedo-deterministic posterior distribution for each variable independently. By training our neural network to 'denoise' the current posterior, by taking into account mutual information between variables, we implicitly minimise a variational lower bound. We can then use our trained neural network to generate samples from the learned distribution.
|
10 |
+
|
11 |
+
One of the benefits of defining such a process in probability parameter space is that it can be applied to *any* family of distributions with continous-valued parameters. This means that BFNs can be directly applied to discrete data, allowing for diffusion-like generative modelling for sequences without restrictive left-to-right inductive biases or relying on discrete-time stochastic processes. The main focus of our work is to investigate the application of BFNs to *protein sequences*, as represented by a sequence of amino acids. The ProtBFN methodology is broadly summarised below:
|
12 |
+
|
13 |
+
![An overview of ProtBFN.](BFN_overview.png)
|
14 |
+
|
15 |
+
Having trained ProtBFN, we find that it is exceptionally performant at unconditional generation of de novo protein sequences. For example, we find that we are able to rediscover a variety of structural motifs, according to structures predicted by ESMFold, with high sequence novelty:
|
16 |
+
|
17 |
+
![Cath hits for ProtBFN.](cath_s40_proteins.png)
|
cath_s40_proteins.png
ADDED