Model Card for Model ID
This is a protein language model trained on the "G-protein coupled receptor" cluster of UniRef90. This is a very crude selection of GPCR proteins, selected to train a small model purely for training perpuses.
By focusing on 80k GCPR sequences (which are relatively similar) we are able to train a small model on a MacBook Air, yet still run some follow up experiments within this particular protein domain.
Model Details
Model Description
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: Michel Nivard
- Model type: 30m parameter protein language model with a ModernBERT architecture
- Language(s) (NLP): Protein sequences
Uses
Direct Use
[More Information Needed]
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
Training Details
Training Data
80k GPCR protein sequences
Training Procedure
MLM with 15% of amino-acids masked
- Downloads last month
- 42