Edit model card

BERT base for proteins

This is bidirectional transformer pretrained on amino-acid sequences of human proteins.

Example: Insulin (P01308)

MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN

The model was trained using the masked-language-modeling objective.

Intended uses

This model is primarily aimed at being fine-tuned on the following tasks:

  • protein function
  • molecule-to-gene-expression mapping
  • cell targeting

How to use in your code

from transformers import BertTokenizerFast, BertModel
checkpoint = 'unikei/bert-base-proteins'
tokenizer = BertTokenizerFast.from_pretrained(checkpoint)
model = BertModel.from_pretrained(checkpoint)

example = 'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN'
tokens = tokenizer(example, return_tensors='pt')
predictions = model(**tokens)
Downloads last month
296
Safetensors
Model size
86.1M params
Tensor type
F32
·